Skip to content

perf: use sargable (Search ARGument ABLE) range predicates for datetime search filters#809

Merged
aldbr merged 3 commits into
DIRACGrid:mainfrom
ryuwd:roneil-date-format-slow
Feb 26, 2026
Merged

perf: use sargable (Search ARGument ABLE) range predicates for datetime search filters#809
aldbr merged 3 commits into
DIRACGrid:mainfrom
ryuwd:roneil-date-format-slow

Conversation

@ryuwd

@ryuwd ryuwd commented Feb 26, 2026

Copy link
Copy Markdown
Contributor
  • apply_search_filters previously used date_trunc() to wrap datetime columns in date_format() (MySQL) / strftime() (SQLite), which prevents the database from using indexes on those columns
  • Replaced with range-based comparisons on the raw column (e.g., col >= start AND col < end instead of date_format(col, '%Y-%m-%d') = '2025-08-25')
  • All precision levels (YEAR through SECOND) and all comparison operators (eq, neq, gt, lt, in, not in) are handled with index-friendly range predicates
-- Direct comparison (uses index): 0.649s
SELECT count(*) FROM Jobs
WHERE Status = 'Done' AND Site = 'LCG.CERN.cern'
  AND LastUpdateTime > '2025-08-25 15:35:31';

-- date_format() wrapper (full table scan): 12.806s
SELECT count(*) FROM Jobs
WHERE Status = 'Done' AND Site = 'LCG.CERN.cern'
  AND date_format(LastUpdateTime, '%Y-%m-%d %H:%i:%S') > '2025-08-25 15:35:31';

Closes #642

Replace date_trunc() (which wraps columns in date_format() on MySQL)
with range-based comparisons on the raw column. This allows the database
to use indexes on datetime columns instead of performing full table scans.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes datetime search filter performance by replacing non-sargable date_trunc() function calls with index-friendly range predicates. The old implementation wrapped datetime columns in date_format() (MySQL) or strftime() (SQLite) which prevented database index usage, causing full table scans. The new implementation uses direct range comparisons (col >= start AND col < end) that allow the database to utilize indexes on datetime columns, resulting in significant performance improvements (e.g., 0.649s vs 12.806s in the example provided).

Changes:

  • Removed date_trunc() wrapper and import from base.py
  • Added three new helper functions to compute datetime period boundaries and build sargable range expressions
  • Modified apply_search_filters to use range predicates for all datetime search operations (eq, neq, gt, lt, in, not in) across all precision levels (YEAR, MONTH, DAY, HOUR, MINUTE, SECOND)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diracx-db/src/diracx/db/sql/utils/base.py Outdated
Comment thread diracx-db/src/diracx/db/sql/utils/base.py Outdated
@DIRACGridBot DIRACGridBot marked this pull request as draft February 26, 2026 12:54
Replace raw string comparisons with ScalarSearchOperator,
VectorSearchOperator, and a new TimeResolution StrEnum for type safety
in apply_search_filters and the datetime range helpers.
@ryuwd ryuwd requested a review from aldbr February 26, 2026 13:07

@aldbr aldbr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops I don't know what happened but it looks like some of my comments were removed 😅

Comment thread diracx-db/src/diracx/db/sql/utils/base.py
Comment thread diracx-db/src/diracx/db/sql/utils/base.py
Comment thread diracx-db/src/diracx/db/sql/utils/base.py
Remove the date_trunc function from functions.py as it has no remaining
callers. Replace elif with if in range builder functions since each
branch returns early.
@ryuwd ryuwd marked this pull request as ready for review February 26, 2026 13:57

@aldbr aldbr left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aldbr aldbr enabled auto-merge (squash) February 26, 2026 14:05
@aldbr aldbr merged commit c94dad5 into DIRACGrid:main Feb 26, 2026
38 of 42 checks passed
Stellatsuu pushed a commit to Stellatsuu/diracx that referenced this pull request Mar 10, 2026
…me search filters (DIRACGrid#809)

* perf: use sargable range predicates for datetime search filters

Replace date_trunc() (which wraps columns in date_format() on MySQL)
with range-based comparisons on the raw column. This allows the database
to use indexes on datetime columns instead of performing full table scans.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

date_format is slow

3 participants