Skip to content

Commit 58f7bd5

Browse files
committed
Add feature to enable semantic search
Why these changes are being introduced: We need a means to toggle the new semantic search query mode in the UI. Relevant ticket(s): - [USE-493](https://mitlibraries.atlassian.net/browse/USE-493) How this addresses that need: This adds a feature to toggle semantic search on and off. Lexical search remains the default. Side effects of this change: - This feature is explicitly disabled for geospatial queries. If we want semantic search for GeoData in the future, we will need to revisit the code. - There is no query param exposing this feature, so it is not currently possible to toggle on a per-query basis.
1 parent 5e653ed commit 58f7bd5

9 files changed

Lines changed: 216 additions & 3 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ may have unexpected consequences if applied to other TIMDEX UI apps.
106106
- `FEATURE_TAB_TIMDEX_ALL`: Display a tab for displaying the combined TIMDEX data. `TIMDEX_INDEX` affects which data appears in this tab.
107107
- `FEATURE_TAB_TIMDEX_ALMA`: Display a tab for displaying Alma data from TIMDEX. `TIMDEX_INDEX` must include `Alma` data or no results will return.
108108
- `FEATURE_TIMDEX_FULLTEXT`: Activate fulltext searching for sources in TIMDEX that support it
109+
- `FEATURE_TIMDEX_SEMANTIC_SEARCH`: Enables semantic query mode (`queryMode: semantic`) for TIMDEX searches. When disabled, TIMDEX defaults to lexical search behavior.
109110
- `FEATURE_PRIMO_NDE_LINKS`: Enables all Primo UI links to target the NDE version of Primo. When enabled, links will use `/nde/search` and `/nde/fulldisplay` paths along with the NDE view ID from `PRIMO_NDE_VID`.
110111
- `FILTER_ACCESS_TO_FILES`: The name to use instead of "Access to files" for that filter / aggregation.
111112
- `FILTER_CONTENT_TYPE`: The name to use instead of "Content type" for that filter / aggregation.

app/controllers/search_controller.rb

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ def query_timdex(query)
193193
raw = if Feature.enabled?(:geodata)
194194
execute_geospatial_query(query)
195195
else
196-
TimdexBase::Client.query(TimdexSearch::BaseQuery, variables: query)
196+
TimdexBase::Client.query(base_query_for_mode, variables: query)
197197
end
198198

199199
# The response type is a GraphQL::Client::Response, which is not directly serializable, so we
@@ -217,17 +217,25 @@ def query_primo(per_page, offset)
217217
end
218218

219219
def execute_geospatial_query(query)
220+
query = query.except('queryMode')
221+
220222
if query['geobox'] == 'true' && query[:geodistance] == 'true'
221223
TimdexBase::Client.query(TimdexSearch::AllQuery, variables: query)
222224
elsif query['geobox'] == 'true'
223225
TimdexBase::Client.query(TimdexSearch::GeoboxQuery, variables: query)
224226
elsif query['geodistance'] == 'true'
225227
TimdexBase::Client.query(TimdexSearch::GeodistanceQuery, variables: query)
226228
else
227-
TimdexBase::Client.query(TimdexSearch::BaseQuery, variables: query)
229+
TimdexBase::Client.query(base_query_for_mode, variables: query)
228230
end
229231
end
230232

233+
def base_query_for_mode
234+
return TimdexSearch::BaseQuery unless Feature.enabled?(:timdex_semantic_search)
235+
236+
TimdexSearch::SemanticBaseQuery
237+
end
238+
231239
def extract_errors(response)
232240
response[:errors]['data'] if response.is_a?(Hash) && response.key?(:errors) && response[:errors].key?('data')
233241
end

app/models/feature.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@
3434
class Feature
3535
# List of all valid features in the application
3636
VALID_FEATURES = %i[bot_detection geodata boolean_picker oa_always primo_nde_links simulate_search_latency tab_primo_all tab_timdex_all
37-
tab_timdex_alma record_link timdex_fulltext].freeze
37+
tab_timdex_alma record_link timdex_fulltext timdex_semantic_search].freeze
3838

3939
# Check if a feature is enabled by name
4040
#

app/models/query_builder.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ def initialize(enhanced_query)
2020
extract_query(enhanced_query)
2121
extract_geosearch(enhanced_query)
2222
extract_filters(enhanced_query)
23+
@query['queryMode'] = 'semantic' if Feature.enabled?(:timdex_semantic_search)
2324
@query['index'] = ENV.fetch('TIMDEX_INDEX', nil)
2425
@query['booleanType'] = enhanced_query[:booleanType]
2526
@query.compact!

app/models/timdex_search.rb

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,150 @@ class TimdexSearch < TimdexBase
144144
}
145145
GRAPHQL
146146

147+
SemanticBaseQuery = TimdexBase::Client.parse <<-GRAPHQL
148+
query(
149+
$q: String
150+
$citation: String
151+
$contributors: String
152+
$fundingInformation: String
153+
$identifiers: String
154+
$locations: String
155+
$subjects: String
156+
$title: String
157+
$index: String
158+
$from: String
159+
$booleanType: String
160+
$queryMode: String
161+
$fulltext: Boolean
162+
$perPage: Int
163+
$accessToFilesFilter: [String!]
164+
$contentTypeFilter: [String!]
165+
$contributorsFilter: [String!]
166+
$formatFilter: [String!]
167+
$languagesFilter: [String!]
168+
$literaryFormFilter: String
169+
$placesFilter: [String!]
170+
$sourceFilter: [String!]
171+
$subjectsFilter: [String!]
172+
) {
173+
search(
174+
searchterm: $q
175+
citation: $citation
176+
contributors: $contributors
177+
fundingInformation: $fundingInformation
178+
identifiers: $identifiers
179+
locations: $locations
180+
subjects: $subjects
181+
title: $title
182+
index: $index
183+
from: $from
184+
booleanType: $booleanType
185+
queryMode: $queryMode
186+
fulltext: $fulltext
187+
perPage: $perPage
188+
accessToFilesFilter: $accessToFilesFilter
189+
contentTypeFilter: $contentTypeFilter
190+
contributorsFilter: $contributorsFilter
191+
formatFilter: $formatFilter
192+
languagesFilter: $languagesFilter
193+
literaryFormFilter: $literaryFormFilter
194+
placesFilter: $placesFilter
195+
sourceFilter: $sourceFilter
196+
subjectsFilter: $subjectsFilter
197+
) {
198+
hits
199+
records {
200+
timdexRecordId
201+
identifiers {
202+
kind
203+
value
204+
}
205+
title
206+
source
207+
contentType
208+
contributors {
209+
kind
210+
value
211+
}
212+
publicationInformation
213+
dates {
214+
kind
215+
value
216+
range {
217+
gte
218+
lte
219+
}
220+
}
221+
links {
222+
kind
223+
restrictions
224+
text
225+
url
226+
}
227+
notes {
228+
kind
229+
value
230+
}
231+
highlight {
232+
matchedField
233+
matchedPhrases
234+
}
235+
provider
236+
rights {
237+
kind
238+
description
239+
uri
240+
}
241+
sourceLink
242+
summary
243+
subjects {
244+
kind
245+
value
246+
}
247+
citation
248+
}
249+
aggregations {
250+
accessToFiles {
251+
key
252+
docCount
253+
}
254+
contentType {
255+
key
256+
docCount
257+
}
258+
contributors {
259+
key
260+
docCount
261+
}
262+
format {
263+
key
264+
docCount
265+
}
266+
languages {
267+
key
268+
docCount
269+
}
270+
literaryForm {
271+
key
272+
docCount
273+
}
274+
places {
275+
key
276+
docCount
277+
}
278+
source {
279+
key
280+
docCount
281+
}
282+
subjects {
283+
key
284+
docCount
285+
}
286+
}
287+
}
288+
}
289+
GRAPHQL
290+
147291
GeoboxQuery = TimdexBase::Client.parse <<-GRAPHQL
148292
query(
149293
$q: String

config/schema/schema.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1360,6 +1360,16 @@
13601360
},
13611361
"defaultValue": "\"OR\""
13621362
},
1363+
{
1364+
"name": "queryMode",
1365+
"description": "Search mode to use. Defaults to \"lexical\". Options include: \"lexical\", \"semantic\"",
1366+
"type": {
1367+
"kind": "SCALAR",
1368+
"name": "String",
1369+
"ofType": null
1370+
},
1371+
"defaultValue": "\"lexical\""
1372+
},
13631373
{
13641374
"name": "accessToFilesFilter",
13651375
"description": "Filter results by access type. Use the `AccessToFiles` aggregation for a list of possible values. Multiple values are ORed.",

test/controllers/search_controller_test.rb

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1220,4 +1220,26 @@ def source_filter_count(controller)
12201220
# Should not be redirected to Turnstile (doesn't hit SearchController)
12211221
assert_response :success
12221222
end
1223+
1224+
test 'uses BaseQuery when semantic search feature is disabled' do
1225+
# When the feature flag is not enabled, base_query_for_mode returns BaseQuery (default tab is 'all')
1226+
mock_primo_search_all_tab
1227+
mock_timdex_search_all_tab
1228+
1229+
get '/results?q=test'
1230+
1231+
assert_response :success
1232+
end
1233+
1234+
test 'uses SemanticBaseQuery when semantic search feature is enabled' do
1235+
# When the feature flag is enabled, base_query_for_mode returns SemanticBaseQuery (default tab is 'all')
1236+
ClimateControl.modify FEATURE_TIMDEX_SEMANTIC_SEARCH: 'true' do
1237+
mock_primo_search_all_tab
1238+
mock_timdex_search_all_tab
1239+
1240+
get '/results?q=test'
1241+
1242+
assert_response :success
1243+
end
1244+
end
12231245
end

test/models/query_builder_test.rb

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,4 +120,16 @@ class QueryBuilderTest < ActiveSupport::TestCase
120120
}
121121
assert_equal expected, QueryBuilder.new(search).query
122122
end
123+
124+
test 'query builder defaults to lexical mode by omitting queryMode' do
125+
search = { q: 'blah' }
126+
refute_includes(QueryBuilder.new(search).query.keys, 'queryMode')
127+
end
128+
129+
test 'query builder adds semantic queryMode when feature flag is enabled' do
130+
ClimateControl.modify FEATURE_TIMDEX_SEMANTIC_SEARCH: 'true' do
131+
search = { q: 'blah' }
132+
assert_equal('semantic', QueryBuilder.new(search).query['queryMode'])
133+
end
134+
end
123135
end

test_output.log

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
Running 582 tests in a single process (parallelization threshold is 999)
2+
Run options: --seed 1257
3+
4+
# Running:
5+
6+
............................................................................................................................................................................................................................................................S........................................................................................................................................................................................................................S.S..S....SS.....S.S.S......SS.............S.S..........S..SS..S.................................................
7+
8+
Finished in 2.615215s, 222.5438 runs/s, 584.6556 assertions/s.
9+
10+
582 runs, 1529 assertions, 0 failures, 0 errors, 17 skips
11+
12+
You have skipped tests. Run with --verbose for details.
13+
Coverage report generated for Minitest, Unit Tests to /Users/jazairi/workspace/timdex-ui/coverage.
14+
Line Coverage: 96.67% (1366 / 1413)
15+
Lcov style coverage report generated for Minitest, Unit Tests to /Users/jazairi/workspace/timdex-ui/coverage/lcov/coverage.lcov

0 commit comments

Comments
 (0)