Skip to content

Commit 7f54f7e

Browse files
committed
Adds support for global scoring
Why are these changes being introduced: * As we moved from an OpenSearch instance where we control the shard count to OpenSearch Serverless where we do not, we need a way to more consistently score results across shards. This change adds support for global scoring (dfs_query_then_fetch under the hood) to allow us to opt-in to that behavior when we need it. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/USE-602 How does this address that need: * This adds a new feature flag for global scoring and updates the search controller to use it when enabled. Document any side effects to this change: * This mode is more expensive to run, so if we notice performance problems we may need to accept that some results are scored inconsistently across shards.
1 parent 7ca07c6 commit 7f54f7e

7 files changed

Lines changed: 106 additions & 11 deletions

File tree

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ This file highlights the important, discoverable conventions and workflows an AI
2020
|------|----------|
2121
| `FEATURE_BOOLEAN_PICKER` | Allow users to choose AND/OR boolean logic in searches |
2222
| `FEATURE_GEODATA` | Enable geospatial search (bounding box and radius-based queries); defaults to false |
23+
| `FEATURE_GLOBAL_SCORING` | Pass `useGlobalScoring: true` to all TIMDEX GraphQL search queries; defaults to false |
2324
| `FEATURE_OA_ALWAYS` | Always do OpenAlex lookups when DOI or PMID is detected rather than only when LibKey does not return data |
2425
| `FEATURE_RECORD_LINK` | Show "View full record" link in search results |
2526
| `FEATURE_SIMULATE_SEARCH_LATENCY` | Add 1s minimum delay to search results for testing UX behavior |

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ See `Optional Environment Variables` for more information.
100100
- `FEATURE_GEODATA`: Enables features related to geospatial data discovery. Setting this variable to `true` will trigger geodata
101101
mode. Note that this is currently intended _only_ for the geodata app and
102102
may have unexpected consequences if applied to other TIMDEX UI apps.
103+
- `FEATURE_GLOBAL_SCORING`: Pass `useGlobalScoring: true` to all TIMDEX GraphQL search queries; defaults to false
103104
- `FEATURE_OA_ALWAYS`: Enables OpenAccess links from OpenAlex whenever they are available, not just when LibKey does not return data. `OPENALEX_EMAIL` must also be set.
104105
- `FEATURE_RECORD_LINK`: Display the 'View full record' link below each record.
105106
- `FEATURE_SIMULATE_SEARCH_LATENCY`: DO NOT SET IN PRODUCTION. Set to ensure a minimum of a one second delay in returning search results. Useful to see spinners/loaders. Only introduces delay for results that take less than one second to complete.

app/controllers/search_controller.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -191,6 +191,7 @@ def query_timdex(query)
191191
query[:sourceFilter] = ['DSpace@MIT'] if @active_tab == 'dspace'
192192
query[:sourceFilter] = ['Research Databases'] if @active_tab == 'databases'
193193
query[:sourceFilter] = ['OpenGeoMetadata GIS Resources', 'MIT GIS Resources'] if @active_tab == 'geodata'
194+
query[:useGlobalScoring] = Feature.enabled?(:global_scoring)
194195

195196
# We generate unique cache keys to avoid naming collisions.
196197
cache_key = CacheKeyGenerator.call(query)

app/models/feature.rb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,8 @@
3333
#
3434
class Feature
3535
# List of all valid features in the application
36-
VALID_FEATURES = %i[bot_detection geodata boolean_picker oa_always primo_nde_links simulate_search_latency
37-
tab_primo_all tab_timdex_all tab_timdex_alma record_link timdex_fulltext
36+
VALID_FEATURES = %i[bot_detection geodata boolean_picker global_scoring oa_always primo_nde_links
37+
simulate_search_latency tab_primo_all tab_timdex_all tab_timdex_alma record_link timdex_fulltext
3838
timdex_semantic_search].freeze
3939

4040
# Check if a feature is enabled by name

app/models/timdex_search.rb

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ class TimdexSearch < TimdexBase
2727
$placesFilter: [String!]
2828
$sourceFilter: [String!]
2929
$subjectsFilter: [String!]
30+
$useGlobalScoring: Boolean
3031
) {
3132
search(
3233
searchterm: $q
@@ -52,6 +53,7 @@ class TimdexSearch < TimdexBase
5253
placesFilter: $placesFilter
5354
sourceFilter: $sourceFilter
5455
subjectsFilter: $subjectsFilter
56+
useGlobalScoring: $useGlobalScoring
5557
) {
5658
hits
5759
records {
@@ -171,6 +173,7 @@ class TimdexSearch < TimdexBase
171173
$literaryFormFilter: String
172174
$sourceFilter: [String!]
173175
$subjectsFilter: [String!]
176+
$useGlobalScoring: Boolean
174177
) {
175178
search(
176179
searchterm: $q
@@ -184,6 +187,7 @@ class TimdexSearch < TimdexBase
184187
index: $index
185188
from: $from
186189
booleanType: $booleanType
190+
useGlobalScoring: $useGlobalScoring
187191
geobox: {
188192
minLongitude: $geoboxMinLongitude,
189193
minLatitude: $geoboxMinLatitude,
@@ -299,6 +303,7 @@ class TimdexSearch < TimdexBase
299303
$literaryFormFilter: String
300304
$sourceFilter: [String!]
301305
$subjectsFilter: [String!]
306+
$useGlobalScoring: Boolean
302307
) {
303308
search(
304309
searchterm: $q
@@ -312,6 +317,7 @@ class TimdexSearch < TimdexBase
312317
index: $index
313318
from: $from
314319
booleanType: $booleanType
320+
useGlobalScoring: $useGlobalScoring
315321
geodistance: {
316322
distance: $geodistanceDistance,
317323
latitude: $geodistanceLatitude,
@@ -430,6 +436,7 @@ class TimdexSearch < TimdexBase
430436
$literaryFormFilter: String
431437
$sourceFilter: [String!]
432438
$subjectsFilter: [String!]
439+
$useGlobalScoring: Boolean
433440
) {
434441
search(
435442
searchterm: $q
@@ -462,6 +469,7 @@ class TimdexSearch < TimdexBase
462469
literaryFormFilter: $literaryFormFilter
463470
sourceFilter: $sourceFilter
464471
subjectsFilter: $subjectsFilter
472+
useGlobalScoring: $useGlobalScoring
465473
) {
466474
hits
467475
records {

test/controllers/search_controller_test.rb

Lines changed: 92 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,10 @@ def mock_primo_search_all_tab
6363
end
6464

6565
def mock_primo_search_with_hits(total_hits)
66+
# Use when the test needs to control the simulated Primo hit count, e.g., pagination
67+
# boundary tests or UI behavior that reacts to total result numbers. Always returns
68+
# 10 sample documents regardless of `total_hits`; only the reported total varies.
69+
# For tests that don't care about hit counts, prefer `mock_primo_search_success`.
6670
sample_docs = (1..10).map do |i|
6771
{
6872
title: "Sample Primo Document Title #{i}",
@@ -86,11 +90,13 @@ def mock_primo_search_with_hits(total_hits)
8690
end
8791

8892
def mock_timdex_search_success
89-
# Mock the TIMDEX GraphQL client to avoid external API calls (single call)
93+
# Use for standard single-tab TIMDEX tests that don't inspect the query variables
94+
# passed to the GraphQL client. Sets `expects(:query)` without `.with(...)`, so
95+
# Mocha only verifies the call count, not the arguments.
9096
#
91-
# NOTE: As with Primo, this helper assumes a single TIMDEX invocation.
92-
# Tests exercising the 'all' tab should use `mock_timdex_search_all_tab`
93-
# which allows multiple calls.
97+
# - For the 'all' tab (multiple TIMDEX calls): use `mock_timdex_search_all_tab`.
98+
# - For tests asserting on query variables (e.g., feature flags): use
99+
# `build_timdex_mock_response` and set the expectation with `.with(...)` yourself.
94100
sample_result = {
95101
'api' => 'timdex',
96102
'title' => 'Sample TIMDEX Document Title',
@@ -134,11 +140,10 @@ def mock_timdex_search_success
134140
end
135141

136142
def mock_timdex_search_all_tab
137-
# Mock the TIMDEX GraphQL client for the all tab (multiple calls)
138-
#
139-
# This helper is intentionally separate from `mock_timdex_search_success`
140-
# because the merged-search orchestration can invoke TIMDEX multiple
141-
# times. The helper therefore uses `at_least_once` on the expectation.
143+
# Use for tests exercising the 'all' tab, where `MergedSearchService` may invoke
144+
# TIMDEX multiple times. Relaxes the expectation to `at_least_once` to accommodate
145+
# that orchestration. For single-tab TIMDEX tests, prefer `mock_timdex_search_success`
146+
# so unexpected extra calls cause a failure rather than passing silently.
142147
sample_result = {
143148
'api' => 'timdex',
144149
'title' => 'Sample TIMDEX Document Title',
@@ -182,6 +187,11 @@ def mock_timdex_search_all_tab
182187
end
183188

184189
def mock_timdex_search_with_hits(total_hits)
190+
# Use when the test needs to control the simulated TIMDEX hit count, e.g., pagination
191+
# threshold or no-results-message tests. Always returns 10 sample documents regardless
192+
# of `total_hits`; only the reported total varies. Unlike the other TIMDEX helpers,
193+
# this one also mocks `NormalizeTimdexResults` explicitly. For tests that don't care
194+
# about hit counts, prefer `mock_timdex_search_success`.
185195
sample_results = (1..10).map do |i|
186196
{
187197
'title' => "Sample TIMDEX Document Title #{i}",
@@ -225,6 +235,54 @@ def mock_timdex_search_with_hits(total_hits)
225235
NormalizeTimdexResults.expects(:new).returns(mock_normalizer).at_least_once
226236
end
227237

238+
def build_timdex_mock_response
239+
# Use when a test needs to assert on the variables passed to `TimdexBase::Client.query`
240+
# (e.g., verifying a feature flag sets the correct query variable). Returns a fully
241+
# stubbed response object but intentionally does NOT set any expectation on
242+
# `TimdexBase::Client`. The caller is responsible for setting the expectation:
243+
#
244+
# mock_response = build_timdex_mock_response
245+
# TimdexBase::Client.expects(:query).with(TimdexSearch::BaseQuery,
246+
# has_entry(:variables, has_entry(:someVar, expected_value))).returns(mock_response)
247+
#
248+
# The other TIMDEX helpers (`mock_timdex_search_success`, etc.) set the expectation
249+
# themselves without `.with(...)`, so they cannot be used when argument assertions
250+
# are needed — Mocha would end up with two conflicting expectations on the same method.
251+
sample_result = {
252+
'api' => 'timdex',
253+
'title' => 'Sample TIMDEX Document Title',
254+
'timdexRecordId' => 'sample-record-123',
255+
'contentType' => ['Article'],
256+
'dates' => [{ 'kind' => 'Publication date', 'value' => '2023' }],
257+
'contributors' => [{ 'value' => 'Foo Barston', 'kind' => 'Creator' }],
258+
'sourceLink' => 'https://example.com/record'
259+
}
260+
261+
mock_response = mock('timdex_response')
262+
mock_errors = mock('timdex_errors')
263+
mock_errors.stubs(:details).returns({})
264+
mock_errors.stubs(:to_h).returns({})
265+
mock_response.stubs(:errors).returns(mock_errors)
266+
267+
mock_data = mock('timdex_data')
268+
mock_search = mock('timdex_search')
269+
mock_search.stubs(:to_h).returns({
270+
'hits' => 1,
271+
'aggregations' => {},
272+
'records' => [sample_result]
273+
})
274+
mock_data.stubs(:search).returns(mock_search)
275+
mock_data.stubs(:to_h).returns({
276+
'search' => {
277+
'hits' => 1,
278+
'aggregations' => {},
279+
'records' => [sample_result]
280+
}
281+
})
282+
mock_response.stubs(:data).returns(mock_data)
283+
mock_response
284+
end
285+
228286
test 'index shows basic search form by default' do
229287
get '/'
230288
assert_response :success
@@ -1220,4 +1278,29 @@ def source_filter_count(controller)
12201278
# Should not be redirected to Turnstile (doesn't hit SearchController)
12211279
assert_response :success
12221280
end
1281+
1282+
# FEATURE_GLOBAL_SCORING tests
1283+
test 'timdex query passes useGlobalScoring: false when FEATURE_GLOBAL_SCORING is disabled' do
1284+
mock_response = build_timdex_mock_response
1285+
TimdexBase::Client.expects(:query).with(
1286+
TimdexSearch::BaseQuery,
1287+
has_entry(:variables, has_entry(:useGlobalScoring, false))
1288+
).returns(mock_response)
1289+
1290+
get '/results?q=data&tab=timdex'
1291+
assert_response :success
1292+
end
1293+
1294+
test 'timdex query passes useGlobalScoring: true when FEATURE_GLOBAL_SCORING is enabled' do
1295+
mock_response = build_timdex_mock_response
1296+
ClimateControl.modify(FEATURE_GLOBAL_SCORING: 'true') do
1297+
TimdexBase::Client.expects(:query).with(
1298+
TimdexSearch::BaseQuery,
1299+
has_entry(:variables, has_entry(:useGlobalScoring, true))
1300+
).returns(mock_response)
1301+
1302+
get '/results?q=data&tab=timdex'
1303+
assert_response :success
1304+
end
1305+
end
12231306
end

test/models/feature_test.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ class FeatureTest < ActiveSupport::TestCase
44
test 'defined features default to false' do
55
refute Feature.enabled?(:geodata)
66
refute Feature.enabled?(:boolean_picker)
7+
refute Feature.enabled?(:global_scoring)
78
end
89

910
test 'undefined features return false' do

0 commit comments

Comments
 (0)