Skip to content

Commit 4ecfd35

Browse files
Implement batch preparation methods for digitized theses workflow
Why these changes are being introduced: * The batch preparation methods for the digitized theses workflow must be capable of performing the following functions: 1. Sync the batch folder from the digitized theses workflow S3 bucket to a minted batch folder in the DSC S3 bucket 2. Download metadata from Alma via SRU 3. Get an item from DSpace 4. Determine if an item is a valid 'replacement thesis', given DSpace item metadata 5. Organize the contents of the minted batch folder into 'new-theses/' and 'replacement-theses/' subfolders How this addresses that need: * Implement methods on DigitizedThesesWorkflow for batch preparation functions * Expand ItemSubmissionStatus to `CREATE_SUCCESS`, `CREATE_FAILED`, `CREATE_SKIPPED` * Add exceptions to handle item retrieval from DSpace Side effects of this change: * Though we've expanded ItemSubmissionStatus to include `CREATE_*` statuses, `BATCH_CREATED` remains to support backwards compatibility with other workflows. Updating all workflows to use `CREATE_*` will be in the DSC backlog for future work. Relevant ticket(s): * https://mitlibraries.atlassian.net/browse/IN-1626
1 parent 0c852ac commit 4ecfd35

13 files changed

Lines changed: 1619 additions & 4 deletions

File tree

dsc/config.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -83,11 +83,13 @@ def warning_only_loggers(self) -> list:
8383

8484
# Workflow-specific env vars
8585
@property
86-
def dspace_credentials(self) -> str:
86+
def dspace_credentials(self) -> dict:
8787
value = os.getenv("DSPACE_CREDENTIALS")
8888
if not value:
8989
raise OSError("Env var 'DSPACE_CREDENTIALS' must be defined")
90-
return value
90+
credentials = json.loads(value)
91+
92+
return {"IR-8": credentials["ir-8"], "DDC-8": credentials["ddc-8"]}
9193

9294
@property
9395
def metadata_api_url(self) -> str:

dsc/db/models.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,9 @@
2121

2222

2323
class ItemSubmissionStatus(StrEnum):
24+
CREATE_SUCCESS = "create_success"
25+
CREATE_FAILED = "create_failed"
26+
CREATE_SKIPPED = "create_skipped"
2427
BATCH_CREATED = "batch_created"
2528
SUBMIT_SUCCESS = "submit_success"
2629
SUBMIT_FAILED = "submit_failed"

dsc/exceptions.py

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,43 @@ class SQSMessageSendError(Exception):
3030
pass
3131

3232

33+
# Exceptions for DSpace client
34+
class DSpaceClientError(Exception):
35+
"""General exception raised when DSpace client action results in error."""
36+
37+
38+
class DSpaceClientCredentialsNotFoundError(DSpaceClientError):
39+
"""Raise when env var DSPACE_CREDENTIALS does not include submission system."""
40+
41+
42+
class DSpaceClientAuthenticationError(DSpaceClientError):
43+
"""Raise when DSpace client fails authentication.
44+
45+
Authentication may fail due to 401 Unauthorized or 403 Forbidden errors.
46+
"""
47+
48+
def __init__(
49+
self,
50+
dspace_url: str | float | None,
51+
dspace_user: str | float | None,
52+
):
53+
self.message = (
54+
f"Failed to authenticate to DSpace server at '{dspace_url}' with user "
55+
f"'{dspace_user}'. Please verify that the DSPACE_CREDENTIALS "
56+
"environment variable is set correctly and that the DSpace server is "
57+
"accessible."
58+
)
59+
60+
61+
class DSpaceClientSearchError(DSpaceClientError):
62+
"""Raise when DSpace client search operation results in error.
63+
64+
Search is performed by dspace_rest_client.client.search_objects,
65+
which returns None if the response from a GET request returns an
66+
exit code other than 200.
67+
"""
68+
69+
3370
# Exceptions for 'create-batch' step
3471
class BatchCreationFailedError(Exception):
3572
def __init__(self, errors: list[tuple]) -> None:
Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1-
from dsc.workflows.digitized_theses.transformer import DigitizedThesesTransformer, NSMAP
1+
from dsc.workflows.digitized_theses.transformer import NSMAP, DigitizedThesesTransformer
2+
from dsc.workflows.digitized_theses.workflow import DigitizedTheses
23

3-
__all__ = ["DigitizedThesesTransformer", "NSMAP"]
4+
__all__ = ["NSMAP", "DigitizedTheses", "DigitizedThesesTransformer"]

0 commit comments

Comments
 (0)