Skip to content

fix: eliminate N+1 queries in _load_candidate_sessions event_type filter#133

Merged
acailic merged 2 commits into
mainfrom
fix/issue-132-n-plus-one-event-type-filter
Apr 3, 2026
Merged

fix: eliminate N+1 queries in _load_candidate_sessions event_type filter#133
acailic merged 2 commits into
mainfrom
fix/issue-132-n-plus-one-event-type-filter

Conversation

@acailic
Copy link
Copy Markdown
Owner

@acailic acailic commented Apr 3, 2026

Summary

Fixes #132.

  • Replaces the Python loop in storage/search.py:_load_candidate_sessions that issued one SELECT … LIMIT 1 query per candidate session when an event_type filter was active
  • Uses a single SQL IN subquery instead, pushing the filter into the initial session SELECT and reducing query count from O(n sessions) to O(1)

Root cause

# before: N separate round-trips
for session in sessions:
    event_stmt = select(EventModel).where(
        EventModel.session_id == session.id, ...
    ).limit(1)
    event_result = await self.session.execute(event_stmt)
    ...

With 100 candidate sessions this issued 101 SQL queries; with 1000 it issued 1001.

Fix

# after: 0 extra round-trips — pushed into the main SELECT
event_subq = (
    select(EventModel.session_id)
    .where(EventModel.tenant_id == self.tenant_id,
           EventModel.event_type == event_type_str)
    .distinct()
    .scalar_subquery()
)
stmt = stmt.where(SessionModel.id.in_(event_subq))

Test plan

  • ruff check storage/search.py — no issues
  • pytest tests/test_search_routes.py tests/test_collector_baseline.py — 72 passed

🤖 Generated with Claude Code

…ter (#132)

Replace the Python loop that issued one SQL query per session when
filtering by event_type with a single IN subquery pushed into the
initial session SELECT. Query count drops from O(n) to O(1) regardless
of the number of candidate sessions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 3, 2026 20:56
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses #132 by removing an N+1 query pattern in SessionSearchService._load_candidate_sessions when applying an event_type filter, pushing the filter into SQL to keep query count constant.

Changes:

  • Replaced per-session event probing queries with a single IN (SELECT …) subquery filter.
  • Moved the session query execution to after all filters (including event_type) are applied.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread storage/search.py Outdated
sessions_with_event_type.append(session)
sessions = sessions_with_event_type
)
.distinct()
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DISTINCT in the event_type subquery is unnecessary for correctness (duplicates in an IN (SELECT …) don’t change results) and may add avoidable sort/hash work on large event tables. Consider dropping .distinct() or switching to a correlated EXISTS predicate, which often plans better and avoids materializing a distinct set.

Suggested change
.distinct()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped .distinct() from the subquery — agreed it is unnecessary for IN(SELECT ...) correctness and adds avoidable overhead on large event tables. Fixed in 51549ab.

Comment thread storage/search.py
Comment on lines +318 to +334
# Filter by event_type in SQL using a subquery to avoid N+1 queries.
# Previously this was done with a Python loop issuing one query per session.
if event_type:
sessions_with_event_type = []
for session in sessions:
# Check if session has any event of the specified type
# Handle both string and enum values
event_type_str = event_type.value if hasattr(event_type, 'value') else event_type

event_stmt = select(EventModel).where(
EventModel.session_id == session.id,
event_type_str = event_type.value if hasattr(event_type, "value") else event_type
event_subq = (
select(EventModel.session_id)
.where(
EventModel.tenant_id == self.tenant_id,
EventModel.event_type == event_type_str,
).limit(1)
event_result = await self.session.execute(event_stmt)
if event_result.scalar_one_or_none() is not None:
sessions_with_event_type.append(session)
sessions = sessions_with_event_type
)
.distinct()
.scalar_subquery()
)
stmt = stmt.where(SessionModel.id.in_(event_subq))

return sessions
result = await self.session.execute(stmt)
return list(result.scalars().all())
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are existing unit tests for _load_candidate_sessions (e.g., status filtering), but this change significantly alters the event_type filtering behavior by pushing it into SQL. Please add a test that asserts the generated query includes the event_type predicate (or, preferably, an integration-style test that verifies sessions without matching events are excluded).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added two unit tests for _load_candidate_sessions with event_type filter in 51549ab: test_load_candidate_sessions_event_type_filter_in_sql asserts exactly one DB query is issued and that events.event_type appears in the WHERE clause (confirming the predicate is in SQL, not a Python loop); test_load_candidate_sessions_event_type_excludes_non_matching_sessions asserts that when the DB returns no rows the result is empty and still only one query was issued.

…dd tests

- Remove unnecessary .distinct() from the event_type IN-subquery; duplicates
  in an IN(SELECT ...) don't affect correctness and the extra sort/hash work
  is avoidable on large event tables.
- Add two unit tests for _load_candidate_sessions with event_type filter:
  one verifies the predicate is pushed into SQL (single query, no N+1),
  the other asserts empty DB results propagate correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@acailic acailic merged commit 0471c3d into main Apr 3, 2026
6 checks passed
@acailic acailic deleted the fix/issue-132-n-plus-one-event-type-filter branch April 4, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: N+1 query in _load_candidate_sessions when filtering by event_type

2 participants