Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .env.test
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
ALMA_OPENURL=https://na06.alma.exlibrisgroup.com/view/uresolver/01MIT_INST/openurl?
EXL_INST_ID=01MIT_INST
TURNSTILE_SITEKEY=test-sitekey
TURNSTILE_SECRET=test-secret
FEATURE_TIMDEX_FULLTEXT=true
FEATURE_GEODATA=false
FEATURE_PRIMO_NDE_LINKS=false
MIT_ALMA_URL=https://mit.alma.exlibrisgroup.com
Comment thread
matt-bernhardt marked this conversation as resolved.
MIT_PRIMO_URL=https://mit.primo.exlibrisgroup.com
OPENALEX_EMAIL=FAKE_OPENALEX_EMAIL
PRIMO_API_KEY=FAKE_PRIMO_API_KEY
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@
.env
.env.development

# qlty
.qlty

## Environment normalization:
/.bundle
/vendor/bundle
Expand Down
141 changes: 141 additions & 0 deletions app/models/alma_sru.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# frozen_string_literal: true

# Queries the Alma SRU endpoint for holdings data
#
# @reference https://developers.exlibrisgroup.com/alma/integrations/SRU/
class AlmaSru
class LookupFailure < StandardError; end

class InvalidAlmaId < StandardError; end

NAMESPACE = { 'holding' => 'http://www.loc.gov/MARC21/slim' }.freeze

LOCATION_ORDER = {
'Hayden Library' => 0,
'Lewis Music Library' => 1,
'Rotch Library' => 2,
'Barker Library' => 3,
'Dewey Library' => 4
}.freeze

# lookup is the primary method of interacting with this model.
#
# It will receive an Alma ID, validate it, look it up in the Alma SRU, and return a formatted result.
#
# It accepts an "alma_client" argument for use when testing, but this is not used in normal operations.
def self.lookup(raw_identifier, alma_client: nil)
return [] unless alma_base_url

# Validate the raw identifier received. This will raise an InvalidAlmaId if validation fails.
identifier = validate_alma_id(raw_identifier)
Comment thread
matt-bernhardt marked this conversation as resolved.

# Build URL
url = alma_sru_url(identifier)

# Retrieve that URL
alma_http = setup(url, alma_client)

parse_response(alma_http.timeout(6).get(url), identifier)
rescue InvalidAlmaId
Rails.logger.debug("Invalid Alma ID: #{raw_identifier}")

[]
rescue LookupFailure => e
Rails.logger.debug("Alma lookup failure: #{e}")

[]
rescue HTTP::Error
Sentry.capture_message('Alma SRU connection failure')
Rails.logger.error('Alma SRU connection error')

[]
Comment thread
qltysh[bot] marked this conversation as resolved.
Comment thread
JPrevost marked this conversation as resolved.
end

# parse_response receives the raw response from the Alma SRU endpoint.
#
# For any non-200 response, it raises a LookupFailure.
#
# Other responses (in XML format) are parsed by Nokogiri, and we pluck content with an `AVA` tag.
def self.parse_response(raw_response, reference_identifier)
raise LookupFailure, raw_response.status unless raw_response.status == 200

parsed = Nokogiri::XML(raw_response.body.to_s)

# Confirm that control field 001 matches the identifier we received.
parsed_controlfield = fetch_controlfield(parsed)
raise LookupFailure, 'Control field mismatch' unless parsed_controlfield == reference_identifier

# Look up all AVA tags
parsed_availabilities = fetch_availabilities(parsed)

parsed_availabilities.map(&method(:format_availability))
end

# validate_alma_id ensures we are only submitting valid Alma IDs to the SRU endpoint.
#
# It needs to do two thigns:
Comment thread
Copilot marked this conversation as resolved.
Outdated
# 1. Remove the "alma" prefix if one is present. Otherwise, no manipulation of the submitted value should occur.
# 2. Enforce the formatting requirements for a valid alma identifier (start with "99", and end with "6761").
def self.validate_alma_id(raw)
parsed = if raw.start_with?('alma')
raw.delete_prefix('alma')
else
raw
end

raise InvalidAlmaId unless parsed.to_s.start_with?('99')
raise InvalidAlmaId unless parsed.to_s.end_with?('6761')

parsed
Comment thread
matt-bernhardt marked this conversation as resolved.
end
Comment thread
matt-bernhardt marked this conversation as resolved.
Comment on lines +79 to +89

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW this really is a pretty good suggestion. I'm fine with not going in this direction as in theory we should be able to trust the input and the extra cost of running a lookup on potential nonsense like alma99helloworld6761 is probably low. Ensuring we have a test for things like alma99helloworld6761 makes sense though so we understand what to expect (I haven't confirmed if we have that style test yet already)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The objection in this comment is a bit different than the earlier round of comments that focused on non-numeric values like alma99helloworld6761 (a condition which is already covered, I think, by the assertion that a document ID of foo doesn't break anything and just returns the empty response).

The behavior I think this comment is calling out is that the initial parsing will fail on a value like nil, which... fair. I'm updating the function to be robust to this condition, and make sure we avoid NoMethodError problems.

I'm still debating whether to add a condition in the validation to raise InvalidAlmaId with non-numeric values while I work through the other feedback.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. What I'm agreeing with is that 99helloworld6761 is not actually a valid Alma id but we don't check that these are all digits so we actually send that known not-an-almaid at Alma that then returns nothing right? It would be better to know "that isn't an alma id, let's not waste the effort to ask alma what the holdings are".

In other words, our code doesn't break as-is, but it is missing an arguably important edge case about what an Alma ID looks like. My initial spec for this didn't account for this. It's likely no production alma id will ever not be an integer after we strip the prefix... but if it isn't an integer, we sure as heck shouldn't query Alma about it ;)


# ava_to_hash takes an XML element that represents a single availability record
# and converts it to a hash. Each code is a key, while its text is the value.
def self.ava_to_hash(node)
rebuilt = {}

node.children.each do |child|
rebuilt[child.attribute_nodes[0].value] = child.text if child.instance_of?(Nokogiri::XML::Element)
end

rebuilt
end
Comment on lines +93 to +101

# fetch_availabilities receives a parsed XML document (Nokogiri::XML::Document)
#
# This document is parsed using xpath to select on the nodes with an tag of AVA,
# and these are then sorted based on a preferred library order.
def self.fetch_availabilities(parsed_xml)
ava_list = parsed_xml.xpath("//holding:datafield[@tag='AVA']", NAMESPACE)

ava_list
.map { |el| ava_to_hash(el) }
.sort_by { |el| [LOCATION_ORDER.fetch(el['q'], 999), el['q'].to_s] }
end

# fetch_controlfield receives a parsed XML document (Nokogiri::XML::Document)
# and returns the controlfield with an 001 tag, if one exists.
def self.fetch_controlfield(parsed_xml)
Comment thread
JPrevost marked this conversation as resolved.
parsed_xml.xpath("//holding:controlfield[@tag='001']", NAMESPACE)&.text
end

# format_availability receives a hash representing a single availability
# statement, and formats it for human readability.
def self.format_availability(availability)
"#{availability['e']&.humanize} at #{availability['q']} #{availability['c']} (#{availability['d']})"
end
Comment thread
JPrevost marked this conversation as resolved.

def self.alma_base_url
ENV.fetch('MIT_ALMA_URL', nil)
end

def self.alma_sru_url(identifier)
# example identifier: 990000959610106761
"#{alma_base_url}/view/sru/#{ENV.fetch('EXL_INST_ID')}?version=1.2&operation=searchRetrieve&recordSchema=marcxml" \
"&query=alma.all_for_ui=#{identifier}"
Comment thread
matt-bernhardt marked this conversation as resolved.
end

def self.setup(url, alma_client)
alma_client || HTTP.persistent(url)
end
end
Loading