fix: eliminate N+1 queries in _load_candidate_sessions event_type filter#133
Conversation
…ter (#132) Replace the Python loop that issued one SQL query per session when filtering by event_type with a single IN subquery pushed into the initial session SELECT. Query count drops from O(n) to O(1) regardless of the number of candidate sessions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses #132 by removing an N+1 query pattern in SessionSearchService._load_candidate_sessions when applying an event_type filter, pushing the filter into SQL to keep query count constant.
Changes:
- Replaced per-session event probing queries with a single
IN (SELECT …)subquery filter. - Moved the session query execution to after all filters (including
event_type) are applied.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sessions_with_event_type.append(session) | ||
| sessions = sessions_with_event_type | ||
| ) | ||
| .distinct() |
There was a problem hiding this comment.
DISTINCT in the event_type subquery is unnecessary for correctness (duplicates in an IN (SELECT …) don’t change results) and may add avoidable sort/hash work on large event tables. Consider dropping .distinct() or switching to a correlated EXISTS predicate, which often plans better and avoids materializing a distinct set.
| .distinct() |
There was a problem hiding this comment.
Dropped .distinct() from the subquery — agreed it is unnecessary for IN(SELECT ...) correctness and adds avoidable overhead on large event tables. Fixed in 51549ab.
| # Filter by event_type in SQL using a subquery to avoid N+1 queries. | ||
| # Previously this was done with a Python loop issuing one query per session. | ||
| if event_type: | ||
| sessions_with_event_type = [] | ||
| for session in sessions: | ||
| # Check if session has any event of the specified type | ||
| # Handle both string and enum values | ||
| event_type_str = event_type.value if hasattr(event_type, 'value') else event_type | ||
|
|
||
| event_stmt = select(EventModel).where( | ||
| EventModel.session_id == session.id, | ||
| event_type_str = event_type.value if hasattr(event_type, "value") else event_type | ||
| event_subq = ( | ||
| select(EventModel.session_id) | ||
| .where( | ||
| EventModel.tenant_id == self.tenant_id, | ||
| EventModel.event_type == event_type_str, | ||
| ).limit(1) | ||
| event_result = await self.session.execute(event_stmt) | ||
| if event_result.scalar_one_or_none() is not None: | ||
| sessions_with_event_type.append(session) | ||
| sessions = sessions_with_event_type | ||
| ) | ||
| .distinct() | ||
| .scalar_subquery() | ||
| ) | ||
| stmt = stmt.where(SessionModel.id.in_(event_subq)) | ||
|
|
||
| return sessions | ||
| result = await self.session.execute(stmt) | ||
| return list(result.scalars().all()) |
There was a problem hiding this comment.
There are existing unit tests for _load_candidate_sessions (e.g., status filtering), but this change significantly alters the event_type filtering behavior by pushing it into SQL. Please add a test that asserts the generated query includes the event_type predicate (or, preferably, an integration-style test that verifies sessions without matching events are excluded).
There was a problem hiding this comment.
Added two unit tests for _load_candidate_sessions with event_type filter in 51549ab: test_load_candidate_sessions_event_type_filter_in_sql asserts exactly one DB query is issued and that events.event_type appears in the WHERE clause (confirming the predicate is in SQL, not a Python loop); test_load_candidate_sessions_event_type_excludes_non_matching_sessions asserts that when the DB returns no rows the result is empty and still only one query was issued.
…dd tests - Remove unnecessary .distinct() from the event_type IN-subquery; duplicates in an IN(SELECT ...) don't affect correctness and the extra sort/hash work is avoidable on large event tables. - Add two unit tests for _load_candidate_sessions with event_type filter: one verifies the predicate is pushed into SQL (single query, no N+1), the other asserts empty DB results propagate correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
Fixes #132.
storage/search.py:_load_candidate_sessionsthat issued oneSELECT … LIMIT 1query per candidate session when anevent_typefilter was activeINsubquery instead, pushing the filter into the initial sessionSELECTand reducing query count from O(n sessions) to O(1)Root cause
With 100 candidate sessions this issued 101 SQL queries; with 1000 it issued 1001.
Fix
Test plan
ruff check storage/search.py— no issuespytest tests/test_search_routes.py tests/test_collector_baseline.py— 72 passed🤖 Generated with Claude Code