Skip to content

fix(db): deduplicate loadedSubsets and join key requests#1554

Open
v-anton wants to merge 3 commits into
TanStack:mainfrom
v-anton:fix/loaded-subsets-dedup
Open

fix(db): deduplicate loadedSubsets and join key requests#1554
v-anton wants to merge 3 commits into
TanStack:mainfrom
v-anton:fix/loaded-subsets-dedup

Conversation

@v-anton
Copy link
Copy Markdown

@v-anton v-anton commented May 25, 2026

🎯 Changes

Supersedes #1205.

Two related fixes for unbounded loadedSubsets growth:

  1. subscription.ts: requestSnapshot and loadNextItems now check isPredicateSubset before pushing to loadedSubsets, skipping entries already covered by an existing superset predicate.

  2. joins.ts: A loadedJoinKeys Set tracks keys already loaded across pipeline batches. Subsequent batches filter out known keys before calling requestSnapshot, preventing repeated requests for the same join keys.

Without these fixes, join lazy-loading accumulates duplicate LoadSubsetOptions entries, causing SQL queries and query keys to grow unboundedly — eventually hitting "too many SQL variables" errors.

✅ Checklist

  • I have tested this code locally with pnpm test:pr.
  • I have followed the steps in the Contributing guide.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Summary by CodeRabbit

  • Bug Fixes

    • Prevented redundant subset and join-key load requests during subscriptions and lazy joins, stopping unnecessary growth of tracked loads and avoiding repeated snapshot operations. Loading status still reflects concurrent on-demand loads until all complete.
  • Tests

    • Added and updated tests validating deduplication, preload behavior, and concurrent-load handling in live query join scenarios.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6f60ab4e-6121-4bfc-a272-10399e7cd820

📥 Commits

Reviewing files that changed from the base of the PR and between 4c1262e and 96350c1.

📒 Files selected for processing (1)
  • packages/db/tests/query/loaded-subsets-dedup.test.ts

📝 Walkthrough

Walkthrough

This PR deduplicates subset loads: subscription snapshot requests now skip when an existing loaded subset covers the predicate, and join lazy-loading tracks loaded join keys to avoid redundant indexed snapshot requests. Tests and a changeset were added.

Changes

Prevent loadedSubsets unbounded growth through deduplication

Layer / File(s) Summary
Subscription predicate deduplication
packages/db/src/collection/subscription.ts
Import isPredicateSubset and apply it in requestSnapshot and requestLimitedSnapshot so _sync.loadSubset, onLoadSubsetResult, loadedSubsets.push, and promise tracking happen only when the new predicate isn't already covered.
Join key deduplication tracking
packages/db/src/query/compiler/joins.ts
Track loadedJoinKeys during tap interception: deduplicate/filter nulls, exclude already-loaded keys, request indexed snapshots for remaining keys, and record successfully-loaded keys; fallback to full-collection load on index-unavailable.
Deduplication validation tests
packages/db/tests/query/loaded-subsets-dedup.test.ts, packages/db/tests/collection-subscription.test.ts
Add Vitest fixtures and tests that intercept loadSubset calls to assert in-filter decomposition and that repeated join-key/subset requests do not cause unbounded loadedSubsets growth; update a concurrent-load test to use distinct where expressions.
Release notes
.changeset/fix-loaded-subsets-dedup.md
Add a changeset entry declaring a patch bump for @tanstack/db describing the loadedSubsets deduplication fix.

Sequence Diagram(s):

sequenceDiagram
  participant Subscription
  participant _sync
  participant loadedSubsets
  Subscription->>loadedSubsets: check isPredicateSubset(existing, new)
  alt covered
    Subscription-->>Subscription: skip _sync.loadSubset
  else not covered
    Subscription->>_sync: _sync.loadSubset(loadOptions)
    _sync-->>Subscription: onLoadSubsetResult
    Subscription->>loadedSubsets: push(loadOptions)
  end
Loading
sequenceDiagram
  participant ActiveTap
  participant LazyCollection
  participant loadedJoinKeys
  ActiveTap->>ActiveTap: collect unique non-null joinKeys
  ActiveTap->>loadedJoinKeys: exclude already-loaded keys
  alt keysToLoad non-empty
    ActiveTap->>LazyCollection: requestSnapshot(where: inArray(keysToLoad))
    alt loaded
      LazyCollection-->>loadedJoinKeys: record loaded keys
    else not-loaded
      LazyCollection-->>ActiveTap: fallback to full collection request
    end
  end
Loading

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • samwillis

"I nibbled through duplicates so loads are neat,
Keys remembered, repeats face quick defeat.
Subsets stay tidy, no needless repeat,
Joins recall keys, so fetches are sweet.
A rabbit's soft clap for a cleanup complete 🐇"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: deduplicating loadedSubsets and join key requests to prevent unbounded growth.
Description check ✅ Passed The description covers changes in detail, includes motivation, follows the template structure with required sections, and confirms testing and changeset generation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/db/src/collection/subscription.ts (1)

380-391: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Dedup bookkeeping happens after issuing loadSubset, which can break unload symmetry.

loadSubset(loadOptions) is executed before dedup checks, so duplicate/subset requests can still be issued but not recorded in loadedSubsets. On unsubscribe, those untracked loads won’t get matching unload calls.

Suggested fix
-    const syncResult = this.collection._sync.loadSubset(loadOptions)
-
-    // Pass the raw loadSubset result to the caller for external tracking
-    opts?.onLoadSubsetResult?.(syncResult)
-
-    // Track this loadSubset call so we can unload it later
     const isAlreadyCovered = this.loadedSubsets.some((existing) =>
       isPredicateSubset(loadOptions, existing),
     )
-    if (!isAlreadyCovered) {
+    if (!isAlreadyCovered) {
+      const syncResult = this.collection._sync.loadSubset(loadOptions)
+      opts?.onLoadSubsetResult?.(syncResult)
       this.loadedSubsets.push(loadOptions)
+      if (trackLoadSubsetPromise) {
+        this.trackLoadSubsetPromise(syncResult)
+      }
+    } else {
+      // Optional: report no-op load to caller if needed by caller contract
+      opts?.onLoadSubsetResult?.(true)
     }
-
-    const trackLoadSubsetPromise = opts?.trackLoadSubsetPromise ?? true
-    if (trackLoadSubsetPromise) {
-      this.trackLoadSubsetPromise(syncResult)
-    }

Apply equivalent ordering in requestLimitedSnapshot as well.

Also applies to: 623-634

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/collection/subscription.ts` around lines 380 - 391, Reorder
the dedup bookkeeping so you check isAlreadyCovered via isPredicateSubset and
update loadedSubsets before calling this.collection._sync.loadSubset — only call
loadSubset when not already covered — and still invoke opts?.onLoadSubsetResult
with the actual load result when you do call it (or a noop/early-return when
skipped); apply the same reordering/logic in requestLimitedSnapshot as well so
unload symmetry is preserved for both code paths.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/db/src/query/compiler/joins.ts`:
- Around line 321-325: The code is adding joinKeys into loadedJoinKeys before
confirming the async loadSubset succeeded, which can prevent retries when
loadSubset later rejects; in the requestSnapshot / join handling logic
(referencing requestSnapshot, loadSubset, loadedJoinKeys, and joinKeys), move
the loadedJoinKeys.add(key) updates so they only run after the async
loadSubset/requestSnapshot completes successfully (i.e., inside the success path
or then block), and ensure failed loads do not mark keys as loaded so subsequent
batches can retry.

In `@packages/db/tests/query/loaded-subsets-dedup.test.ts`:
- Around line 164-176: The test currently may pass vacuously if no 'in' filters
are produced; update the loop that iterates newCalls and inspects
extractSimpleComparisons(call.where) to assert that at least one filter with
operator === 'in' was seen. For example, introduce a counter or collect matching
filters (referencing newCalls, extractSimpleComparisons, filters and
filter.operator) and after the loop add an assertion that the count/collection
length is > 0 before relying on contains/not.toContain checks.
- Around line 94-112: The test never triggers the second snapshot request; after
calling await liveQuery.preload() capture initialCallCount as you do, then
invoke and await the second snapshot action (e.g., await
liveQuery.requestSnapshot() or the appropriate method that re-requests subsets)
to actually exercise dedup logic, then assert loadSubsetCalls.length remains
equal to initialCallCount; reference liveQuery.preload(),
liveQuery.requestSnapshot(), and loadSubsetCalls in the change.
- Line 125: Replace the unsafe cast by correctly typing parentWrite: change its
type from the ad-hoc (msg: { type: string; value: Parent }) => void to
ChangeMessageOrDeleteKeyMessage<Parent, number> (import
ChangeMessageOrDeleteKeyMessage from ../../src/types.js) and then assign
parentWrite = write (no "as any"); ensure the generic keyof/number matches the
test's Parent and TKey usage so the assignment type-checks.

---

Outside diff comments:
In `@packages/db/src/collection/subscription.ts`:
- Around line 380-391: Reorder the dedup bookkeeping so you check
isAlreadyCovered via isPredicateSubset and update loadedSubsets before calling
this.collection._sync.loadSubset — only call loadSubset when not already covered
— and still invoke opts?.onLoadSubsetResult with the actual load result when you
do call it (or a noop/early-return when skipped); apply the same
reordering/logic in requestLimitedSnapshot as well so unload symmetry is
preserved for both code paths.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5754dc72-3946-45bc-b96e-1abdc624f6a8

📥 Commits

Reviewing files that changed from the base of the PR and between be656be and f8828b3.

📒 Files selected for processing (4)
  • .changeset/fix-loaded-subsets-dedup.md
  • packages/db/src/collection/subscription.ts
  • packages/db/src/query/compiler/joins.ts
  • packages/db/tests/query/loaded-subsets-dedup.test.ts

Comment on lines +321 to +325
if (loaded) {
for (const key of joinKeys) {
loadedJoinKeys.add(key)
}
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

loadedJoinKeys is updated before async load success, which can suppress needed retries.

requestSnapshot can return true even if the underlying async loadSubset later rejects. Adding keys immediately means failed keys won’t be retried in later batches.

Suggested fix
-            const loaded = lazySourceSubscription.requestSnapshot({
+            let loadResult: Promise<void> | true = true
+            const loaded = lazySourceSubscription.requestSnapshot({
               where: inArray(lazyJoinRef, joinKeys),
               optimizedOnly: true,
+              onLoadSubsetResult: (result) => {
+                loadResult = result
+              },
             })

             if (loaded) {
-              for (const key of joinKeys) {
-                loadedJoinKeys.add(key)
-              }
+              if (loadResult instanceof Promise) {
+                loadResult
+                  .then(() => {
+                    for (const key of joinKeys) loadedJoinKeys.add(key)
+                  })
+                  .catch(() => {
+                    // keep keys retriable
+                  })
+              } else {
+                for (const key of joinKeys) loadedJoinKeys.add(key)
+              }
             } else {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/query/compiler/joins.ts` around lines 321 - 325, The code is
adding joinKeys into loadedJoinKeys before confirming the async loadSubset
succeeded, which can prevent retries when loadSubset later rejects; in the
requestSnapshot / join handling logic (referencing requestSnapshot, loadSubset,
loadedJoinKeys, and joinKeys), move the loadedJoinKeys.add(key) updates so they
only run after the async loadSubset/requestSnapshot completes successfully
(i.e., inside the success path or then block), and ensure failed loads do not
mark keys as loaded so subsequent batches can retry.

Comment thread packages/db/tests/query/loaded-subsets-dedup.test.ts
Comment thread packages/db/tests/query/loaded-subsets-dedup.test.ts Outdated
Comment thread packages/db/tests/query/loaded-subsets-dedup.test.ts
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
packages/db/tests/query/loaded-subsets-dedup.test.ts (1)

100-118: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

First dedup regression test is still not exercising the second request path.

On Line 100 you call preload() once, then on Line 117 assert call count didn’t grow, but no second snapshot/preload action is performed. This can still pass vacuously and won’t catch regressions in repeated-request dedupe.

As per coding guidelines, "Always add unit tests that reproduce a bug before fixing it to ensure the bug is fixed and prevent regression".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/tests/query/loaded-subsets-dedup.test.ts` around lines 100 - 118,
The test only calls liveQuery.preload() once then asserts loadSubsetCalls length
unchanged, so it never exercises the repeated-request dedupe path; update the
test (around liveQuery.preload(), loadSubsetCalls and extractSimpleComparisons
usage) to perform the second preload/snapshot call (e.g., call
liveQuery.preload() or the snapshot-triggering helper a second time), capture
loadSubsetCalls.length before and after the second call, and assert the call
count does not increase and the second request (if present) has the same filters
as the first by reusing firstCall = loadSubsetCalls[0] and comparing
extractSimpleComparisons on any subsequent call (or asserting no new call was
pushed).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@packages/db/tests/query/loaded-subsets-dedup.test.ts`:
- Around line 100-118: The test only calls liveQuery.preload() once then asserts
loadSubsetCalls length unchanged, so it never exercises the repeated-request
dedupe path; update the test (around liveQuery.preload(), loadSubsetCalls and
extractSimpleComparisons usage) to perform the second preload/snapshot call
(e.g., call liveQuery.preload() or the snapshot-triggering helper a second
time), capture loadSubsetCalls.length before and after the second call, and
assert the call count does not increase and the second request (if present) has
the same filters as the first by reusing firstCall = loadSubsetCalls[0] and
comparing extractSimpleComparisons on any subsequent call (or asserting no new
call was pushed).

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3471791a-036d-4347-979d-54625d24bc5f

📥 Commits

Reviewing files that changed from the base of the PR and between f8828b3 and 4c1262e.

📒 Files selected for processing (3)
  • packages/db/src/collection/subscription.ts
  • packages/db/tests/collection-subscription.test.ts
  • packages/db/tests/query/loaded-subsets-dedup.test.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant