Skip to content

fix(sqlite): inline JSON paths for expression indexes#1487

Open
lalitkapoor wants to merge 2 commits intoTanStack:mainfrom
lalitkapoor:fix/sqlite-expression-index-json-paths
Open

fix(sqlite): inline JSON paths for expression indexes#1487
lalitkapoor wants to merge 2 commits intoTanStack:mainfrom
lalitkapoor:fix/sqlite-expression-index-json-paths

Conversation

@lalitkapoor
Copy link
Copy Markdown

@lalitkapoor lalitkapoor commented Apr 21, 2026

🎯 Changes

Fixes a SQLite planner mismatch in @tanstack/db-sqlite-persistence-core.

This came from a real app query against persisted threadMessages data:

q
  .from({ m: threadMessages })
  .where(({ m }) => eq(m.threadId, threadId))
  .orderBy(({ m }) => m.createdAt, 'desc')
  .orderBy(({ m }) => m.id, 'desc')

In the app, we first added the missing persisted filter index:

collection.createIndex((row) => row.threadId)

At that point, we expected SQLite to stop doing a full table scan for the filter.

The important expectation was not just “there is now an index on threadId”. It was:

  1. the persisted expression index for threadId is stored as literal-path SQL
  2. the runtime query for WHERE threadId = ? should compile to the same expression shape
  3. if those shapes match, SQLite should be able to plan SEARCH ... USING INDEX ... for the filter

The expected planner outcome after adding that index was:

  • SEARCH ... USING INDEX <threadId expression index> for the WHERE threadId = ? filter
  • possibly still USE TEMP B-TREE FOR ORDER BY, because the app query also sorts by createdAt DESC, id DESC and there is no composite persisted index for (threadId, createdAt, id)

What the runtime query was actually compiled into was the full SQL shape below:

SELECT key, value, metadata, row_version
FROM "c_z6sj6b_e"
WHERE ((CASE json_extract(value, ?)
  WHEN 'bigint' THEN CAST(json_extract(value, ?) AS NUMERIC)
  WHEN 'date' THEN json_extract(value, ?)
  WHEN 'nan' THEN NULL
  WHEN 'infinity' THEN NULL
  WHEN '-infinity' THEN NULL
  ELSE json_extract(value, ?)
END) = ?)
ORDER BY (CASE json_extract(value, ?)
  WHEN 'bigint' THEN CAST(json_extract(value, ?) AS NUMERIC)
  WHEN 'date' THEN json_extract(value, ?)
  WHEN 'nan' THEN NULL
  WHEN 'infinity' THEN NULL
  WHEN '-infinity' THEN NULL
  ELSE json_extract(value, ?)
END) DESC NULLS FIRST,
(CASE json_extract(value, ?)
  WHEN 'bigint' THEN CAST(json_extract(value, ?) AS NUMERIC)
  WHEN 'date' THEN json_extract(value, ?)
  WHEN 'nan' THEN NULL
  WHEN 'infinity' THEN NULL
  WHEN '-infinity' THEN NULL
  ELSE json_extract(value, ?)
END) DESC NULLS FIRST,
key ASC

with params shaped like:

[
  '$.threadId.__tanstack_db_persisted_type__',
  '$.threadId.value',
  '$.threadId.value',
  '$.threadId',
  '92ebe40d-e545-40f2-b0fd-bba86c41b86e',
  '$.createdAt.__tanstack_db_persisted_type__',
  '$.createdAt.value',
  '$.createdAt.value',
  '$.createdAt',
  '$.id.__tanstack_db_persisted_type__',
  '$.id.value',
  '$.id.value',
  '$.id',
]

The persisted expression index for threadId had already been normalized into literal-path SQL such as:

json_extract(value, '$.threadId')

So the runtime query and the persisted index were logically equivalent but not structurally identical from SQLite's perspective. SQLite expression-index matching is shape-sensitive here: json_extract(value, ?) is not the same expression as json_extract(value, '$.threadId') for planner matching purposes.

That is why adding the missing threadId index was necessary but not sufficient: the index existed, but the framework was still compiling the runtime predicate into a form that could not match that index.

What we should have expected the runtime query to look like, in order to leverage the threadId expression index, was the full query shape below. The important part is that the threadId expression is compiled with literal JSON paths, while the actual thread ID remains bound:

SELECT key, value, metadata, row_version
FROM "c_z6sj6b_e"
WHERE ((CASE json_extract(value, '$.threadId.__tanstack_db_persisted_type__')
  WHEN 'bigint' THEN CAST(json_extract(value, '$.threadId.value') AS NUMERIC)
  WHEN 'date' THEN json_extract(value, '$.threadId.value')
  WHEN 'nan' THEN NULL
  WHEN 'infinity' THEN NULL
  WHEN '-infinity' THEN NULL
  ELSE json_extract(value, '$.threadId')
END) = ?)
ORDER BY (CASE json_extract(value, '$.createdAt.__tanstack_db_persisted_type__')
  WHEN 'bigint' THEN CAST(json_extract(value, '$.createdAt.value') AS NUMERIC)
  WHEN 'date' THEN json_extract(value, '$.createdAt.value')
  WHEN 'nan' THEN NULL
  WHEN 'infinity' THEN NULL
  WHEN '-infinity' THEN NULL
  ELSE json_extract(value, '$.createdAt')
END) DESC NULLS FIRST,
(CASE json_extract(value, '$.id.__tanstack_db_persisted_type__')
  WHEN 'bigint' THEN CAST(json_extract(value, '$.id.value') AS NUMERIC)
  WHEN 'date' THEN json_extract(value, '$.id.value')
  WHEN 'nan' THEN NULL
  WHEN 'infinity' THEN NULL
  WHEN '-infinity' THEN NULL
  ELSE json_extract(value, '$.id')
END) DESC NULLS FIRST,
key ASC

with params shaped like:

['92ebe40d-e545-40f2-b0fd-bba86c41b86e']

That full query shape would let SQLite match the filter expression against the persisted threadId expression index. The remaining ORDER BY could still require a temp sort, which is fine and expected for this query shape.

This PR fixes that by compiling runtime ref expressions with inlined JSON-path literals in compileRefExpressionSql(...), while still keeping real filter values bound.

After this change, the runtime SQL shape matches the persisted index expression shape, which allows SQLite to use the index for the filter path.

It also adds two regressions:

  • core/shared regression: verifies runtime subset SQL inlines JSON paths but keeps actual values bound
  • node driver regression: verifies better-sqlite3 uses the expression index for the emitted runtime SQL via EXPLAIN QUERY PLAN

✅ Checklist

  • I have tested this code locally with pnpm test.

🚀 Release Impact

  • This change affects published code, and I have generated a changeset.
  • This change is docs/CI/dev-only (no release).

Verification

Passed on the fixed tree:

  • pnpm exec vitest --run tests/sqlite-core-adapter-cli-runtime.test.ts
  • pnpm exec vitest --run tests/node-sqlite-core-adapter-contract.test.ts
  • pnpm --filter @tanstack/db-sqlite-persistence-core test

Negative verification:

I temporarily reverted only compileRefExpressionSql(...) locally and re-ran the targeted regressions.

Without the fix:

  • the SQL-shape regression failed because the emitted query contained json_extract(value, ?) instead of inlined JSON paths
  • the node planner regression failed because the runtime query reintroduced bound JSON-path params ('$.threadId.__tanstack_db_persisted_type__', '$.threadId.value', etc.) instead of keeping only the real filter value bound

@lalitkapoor lalitkapoor marked this pull request as ready for review April 21, 2026 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant