Date: 2026-02-02
Accepted
The Opensearch model (app/models/opensearch.rb) has grown into a large, monolithic class (≈360+ lines).
The model:
- Builds the entire OpenSearch request body (query, aggregations, sort, highlight) in one place.
- Mixes multiple concerns:
- Lexical query construction (multi_match, single-field matches, nested matches).
- Geographic constraints (geodistance, bounding box).
- Aggregation filters (contributors, formats, languages, access rights, etc.).
- Top-level request assembly and client invocation.
- Is already difficult to understand and extend safely. We know we will soon add semantic and hybrid (lexical + semantic) search behaviors, which will further increase complexity.
Current state:
- The GraphQL
searchfield (app/graphql/types/query_type.rb) is the main entry point and callsOpensearch.new.search(...). - All lexical and geo query construction is hardcoded inside
Opensearch#query,#matches,#multisearch, and various helper methods. - Filter logic (
filters,filter_field_by_value, etc.) is also embedded in the same class.
We want to:
- Make the query-building logic easier to understand and test in isolation.
- Prepare for additional query modes (semantic, hybrid).
- Keep the external API and behavior unchanged for now (same GraphQL schema, same OpenSearch request shape).
We will refactor the Opensearch model to introduce:
-
A query strategy abstraction (Strategy pattern)
- Define a simple query-strategy contract, e.g. a module
Opensearch::QueryStrategywith a single method:build(params, fulltext) # => Hash
Opensearch.searchwill:- Select a query strategy (for now, always the lexical strategy).
- Delegate query construction to
strategy.build(params, fulltext)insidebuild_query.
- Future query modes (semantic, hybrid) will be modeled as additional strategies that implement the same interface.
- NOTE: we could choose to defer implementation of the Strategy abstraction and focus solely on the Builders at this time. It is relatively simple, this is not a requirement to gain the core benefits of this refactor.
- Define a simple query-strategy contract, e.g. a module
-
A lexical query builder as the first strategy (Builder + Strategy)
- Create
Opensearch::LexicalQueryBuilder(e.g.app/models/opensearch/lexical_query_builder.rb), which:- Implements the
QueryStrategyinterface. - Encapsulates the current “lexical” query behavior:
- bool query structure (must/should/filter).
multisearch(prefix/term boosts for title, contributors, etc.).matchesfor mainq.- Single-field and nested matches (citation, title, contributors, subjects, etc.).
- Geographic clauses (geodistance, bounding box) as part of the overall bool query.
- Uses
FilterBuilder(see next item) for the filter portion of the bool query.
- Implements the
Opensearchbecomes an orchestrator:- It builds the top-level request:
from,size,query(via strategy),aggregations,sort, and optionalhighlight. - It no longer encodes the detailed lexical/geo/field logic directly.
- It builds the top-level request:
- Create
-
A dedicated filter builder (Builder pattern)
- Create
Opensearch::FilterBuilder(e.g.app/models/opensearch/filter_builder.rb). - Move all aggregation/filter-related responsibilities out of
Opensearch:filtersfilter_field_by_valuefilter_sources/source_arrayfilter_access_to_files/access_to_files_array
FilterBuilderwill:- Accept the search params.
- Return the array of filter clauses used in the OpenSearch query.
- This builder is anticipated to be reusable in future strategies which is why it is pulled out of
LexicalQueryBuilder.
- Create
-
A dedicated sort builder
- While sort is currently extremely simple, we'll take this opportunity to move it out of OpenSearch to simplify the OpenSearch class.
- This will prepare us for additional sort algorithms, but mostly this move is to simplify the main OpenSearch class.
-
No external API or behavior changes
- The signature of
Opensearch#searchremains the same. - The GraphQL schema and
construct_querybehavior remain unchanged. - The shape of the OpenSearch request (including query, filters, aggregations, sort, highlight) is preserved.
- Existing tests (including VCR-based tests) should continue to pass unchanged.
- The signature of
-
Improved maintainability and readability
Opensearchis smaller and focused on orchestration and top-level request assembly.- Lexical query details and filter details are encapsulated in dedicated classes.
-
Better testability
FilterBuilderandLexicalQueryBuildercan be unit-tested in isolation:- Given specific params, they produce well-defined hash structures.
- We can more easily assert on the shape of queries and filters without coupling to client calls.
-
Clear extension point for future features
- Semantic and hybrid search can be added later by:
- Implementing new query strategies (e.g.
SemanticQueryStrategy,HybridQueryStrategy) that conform to the samebuild(params, fulltext)interface. - Reusing
FilterBuilderfor aggregation filters and geo constraints. - Adding a small “strategy selection” step in
Opensearch(e.g. based on asearch_modeparam).
- Implementing new query strategies (e.g.
- This prepares us to focus solely on the semantic/hybrid work when we get to that rather than trying to introduce a refactor with a new feature.
- Semantic and hybrid search can be added later by:
-
Easier incremental refactors
- Additional internal refactoring (e.g. introducing a
SearchParamsvalue object or specialized helpers for geo) can be done within builders/strategies without touching the external API.
- Additional internal refactoring (e.g. introducing a
-
More classes and indirection
- Adding
FilterBuilder,QueryStrategy, andLexicalQueryBuilderincreases the number of moving parts. - Developers must follow the new abstractions to trace how a query is built.
- Adding
-
Short-term refactor cost
- Code must be carefully moved to avoid breaking existing behavior and VCR tests.
- There is some overhead in introducing new tests for
FilterBuilderandLexicalQueryBuilder.
- Semantic and hybrid search are not implemented in this decision
- No k-NN/vector fields, Neural Search configuration, or hybrid score combination are added here.
- No new GraphQL arguments (e.g.
searchMode) are introduced at this time. - Index mappings, ingest pipelines, and model selection remain unchanged.
These will be covered by future ADR(s) once we are ready to add semantic and hybrid query capabilities on top of the refactored structure.
A simplified view of the target architecture (including future state of Semantic/Hybrid builders):
flowchart LR
subgraph api [API layer]
GraphQL[query_type search]
end
subgraph opensearch [Opensearch#search]
build[build_query]
strategy_sel[Strategy selection]
end
subgraph strategies [Query strategies]
QueryStrategy[QueryStrategy interface]
Lexical[LexicalQueryBuilder]
Filters[FilterBuilder]
Semantic[SementicQueryBuilder]
Hybrid[HybridQueryBuilder]
end
GraphQL --> build
build --> strategy_sel
strategy_sel --> QueryStrategy
QueryStrategy -->|implemented by| Lexical
QueryStrategy -->|implemented by| Semantic
QueryStrategy -->|implemented by| Hybrid
Lexical --> Filters
build --> Aggregations
build --> Sort