Skip to content

Digitized theses workflow retrieves item handles for synced replacement theses#244

Merged
jonavellecuerdo merged 1 commit into
mainfrom
IN-1756-digitized-theses-synced-batches
Jun 3, 2026
Merged

Digitized theses workflow retrieves item handles for synced replacement theses#244
jonavellecuerdo merged 1 commit into
mainfrom
IN-1756-digitized-theses-synced-batches

Conversation

@jonavellecuerdo

@jonavellecuerdo jonavellecuerdo commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Purpose and background context

This PR updates the digitized theses workflow to retrieve item handles for replacement theses during the batch creation process when synced=True (i.e., the batch folder was synced from the DSC S3 bucket in Stage to the DSC S3 bucket in Prod).

As shown in DigitizedTheses.prepare_batch, when synced=True, the workflow derives a list of ItemSubmissions based on the contents of the batch folder synced to S3_BUCKET_SUBMISSION_ASSETS rather than repeating the steps outlined in DigitizedTheses._create_batch_in_s3. Effectively, this means:

  • Skipping the step that syncs the batch from the workspace S3 bucket into DSC
  • Skipping downloading of metadata from Alma
  • Skipping checks via DSpace client to determine new, replacement, or skipped theses

When the batch was previously prepared and synced=True, all this information can be derived from the contents of the batch folder (e.g., the thesis type is determined by the subfolder the item submission is located in dsc-upload/digitized-theses/batch/<thesis-type>/).

However, as noted in this Copilot review, DigitizedTheses._get_item_submissions_from_synced_batch was missing a step to retrieve the handle for a DSpace item in the case of a replacement thesis. This PR adds a new code block, which requires using a DSpace client to retrieve the Item object from DSpace and extract the handle attribute.

Notes:

How can a reviewer manually see the effects of these changes?

Review the added unit test.

Note: While it was straightforward to add a unit test for the synced=True path, creating unit tests for synced=False path (i.e., DigitizedTheses._create_batch_in_s3) is harder given all the steps involved. That said, there are unit tests for the smaller sub-methods it relies on + previous testing with MinIO demonstrates the code works as expected. Just continuing to share with reviewers that improving/adding unit tests continue to be on my mind!

Includes new or updated dependencies?

NO

Changes expectations for external applications?

NO

What are the relevant tickets?

Code review

  • Code review best practices are documented here and you are encouraged to have a constructive dialogue with your reviewers about their preferences and expectations.

@coveralls

coveralls commented Jun 1, 2026

Copy link
Copy Markdown

Coverage Report for CI Build 26885385138

Warning

No base build found for commit c971b02 on main.
Coverage changes can't be calculated without a base build.
If a base build is processing, this comment will update automatically when it completes.

Coverage: 83.35%

Details

  • Patch coverage: 6 uncovered changes across 1 file (16 of 22 lines covered, 72.73%).

Uncovered Changes

File Changed Covered %
dsc/workflows/digitized_theses/workflow.py 22 16 72.73%

Coverage Regressions

Requires a base build to compare against. How to fix this →


Coverage Stats

Coverage Status
Relevant Lines: 2048
Covered Lines: 1707
Line Coverage: 83.35%
Coverage Strength: 0.83 hits per line

💛 - Coveralls

@jonavellecuerdo jonavellecuerdo force-pushed the IN-1756-digitized-theses-synced-batches branch from 28a66b2 to a239203 Compare June 1, 2026 15:49
@jonavellecuerdo jonavellecuerdo changed the base branch from main to IN-1748-update-digitized-theses-transformer June 1, 2026 15:50
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1756-digitized-theses-synced-batches branch from a239203 to db736f0 Compare June 1, 2026 17:43
@jonavellecuerdo jonavellecuerdo changed the title [wip] Digitized theses workflow retrieves item handles for synced replacement theses Jun 1, 2026
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1748-update-digitized-theses-transformer branch from 32df1bf to 768c9a1 Compare June 1, 2026 17:48
Base automatically changed from IN-1748-update-digitized-theses-transformer to main June 1, 2026 17:50
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1756-digitized-theses-synced-batches branch from db736f0 to 7c55b01 Compare June 1, 2026 17:54
@jonavellecuerdo jonavellecuerdo marked this pull request as ready for review June 1, 2026 17:56
@jonavellecuerdo jonavellecuerdo requested a review from a team as a code owner June 1, 2026 17:56
@jonavellecuerdo jonavellecuerdo requested a review from ehanson8 June 1, 2026 17:59
@ehanson8 ehanson8 requested a review from Copilot June 1, 2026 20:13

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the digitized theses workflow's synced-batch path to retrieve a DSpace handle for replacement theses by calling _get_item_from_dspace, and refactors the item submission construction to assign status/status_details per thesis type. Also replaces string status literals with ItemSubmissionStatus enum values in _create_batch_in_s3.

Changes:

  • In _get_item_submissions_from_synced_batch, build ItemSubmission first, then look up DSpace handle for replacement theses and set status/details accordingly.
  • Switch hard-coded status strings to ItemSubmissionStatus enum in _create_batch_in_s3 and consolidate the item_submissions.append call.
  • Add a unit test for the synced-batch handle retrieval path; rename mock_s3_digitized_theses fixture to mock_s3_digitized_theses_dsc.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
dsc/workflows/digitized_theses/workflow.py Adds DSpace handle lookup for replacement theses in synced-batch flow; switches to enum-based statuses in _create_batch_in_s3.
tests/workflows/digitized_theses/test_workflow.py Renames fixture and adds a unit test for _get_item_submissions_from_synced_batch.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread dsc/workflows/digitized_theses/workflow.py Outdated
@jonavellecuerdo

Copy link
Copy Markdown
Contributor Author

@ehanson8 Addressed initial Copilot review!

@ehanson8 ehanson8 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good but some structure questions

Comment thread dsc/workflows/digitized_theses/workflow.py
Comment thread dsc/workflows/digitized_theses/workflow.py Outdated
Comment thread dsc/workflows/digitized_theses/workflow.py
jonavellecuerdo added a commit that referenced this pull request Jun 2, 2026
@jonavellecuerdo jonavellecuerdo requested a review from ehanson8 June 2, 2026 14:17

@ehanson8 ehanson8 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Comment thread tests/workflows/digitized_theses/test_workflow.py
…nt theses

Why these changes are being introduced:
* The digitized theses workflow was missing a step to retrieve
item handles for replacement theses when creating a batch from
a synced batch folder. The item handle is required in the
submission message sent to DSS during the `submit` step.

How this addresses that need:
* Use DSpace client to retrieve item handle for replacement theses

Side effects of this change:
* None

Relevant ticket(s):
* https://mitlibraries.atlassian.net/browse/IN-1756
@jonavellecuerdo jonavellecuerdo force-pushed the IN-1756-digitized-theses-synced-batches branch from 7226f74 to d650e86 Compare June 3, 2026 12:41
@jonavellecuerdo jonavellecuerdo merged commit a73b9e2 into main Jun 3, 2026
6 checks passed
@jonavellecuerdo jonavellecuerdo deleted the IN-1756-digitized-theses-synced-batches branch June 3, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants