Conversation
fixes #720 Add tolerant IIIF manifest handling and helper methods to import canvases/images from both Presentation API 2.x and 3.x. Introduces isSupportedManifest, extractCanvases, extractImageUrlsFromCanvas, resolveImageUrl, getServiceId, safeArray/safeObject, getStringValue and readLabel to normalize IDs, labels and image service resolution. Improve fallback for manifest ID and project naming, skip canvases without importable images, and strengthen file-extension detection (handles queries/fragments and more image types using Locale). Overall makes manifest ingestion more resilient and compatible with real-world IIIF manifests.
|
@thehabes-static-review This should be small and in scope |
…ld-accept-presi3-manifests
There was a problem hiding this comment.
Static Review Comments
Branch: 720-createprojectfrommanifestmanifest-should-accept-presi3-manifests
Review Date: 2026-04-09
Reviewer: Static Reviewer - @thehabes
Bryan and Claude make mistakes. Verify all issues and suggestions. Avoid unnecessary scope creep.
| Category | Issues Found |
|---|---|
| 🔴 Critical | 0 |
| 🟠 Major | 1 |
| 🟡 Minor | 2 |
| 🔵 Suggestions | 3 |
Critical Issues 🔴
No critical issues found.
Major Issues 🟠
Issue 1: Duplicate canvases when manifest contains both sequences and items
File: src/java/edu/slu/tpen/servlet/ClassicProjectFromManifest.java:235
Category: Logic Error
Problem:
extractCanvases processes both sequences (v2) and items (v3) independently and concatenates the results. If a real-world manifest includes both fields (e.g., a v3 manifest retaining sequences for backward compatibility, or a hybrid manifest), every canvas will be added to the list twice. This creates duplicate Folio records in the database — duplicate pages in the project with no way for the user to distinguish originals from copies.
The PR description explicitly aims to handle "real-world IIIF manifests," and manifests with both fields do exist in the wild.
Current Code:
private static List<JSONObject> extractCanvases(JSONObject manifest) {
List<JSONObject> canvases = new ArrayList<>();
if (manifest.has("sequences")) {
// ... adds canvases from sequences
}
if (manifest.has("items")) {
// ... adds canvases from items too — duplicates if both exist
}
return canvases;
}Suggested Fix:
Prefer items (v3) over sequences (v2), falling back only when needed:
private static List<JSONObject> extractCanvases(JSONObject manifest) {
List<JSONObject> canvases = new ArrayList<>();
// Prefer IIIF v3 items over v2 sequences
if (manifest.has("items")) {
JSONArray items = safeArray(manifest.get("items"));
for (int i = 0; i < items.size(); i++) {
JSONObject item = safeObject(items.get(i));
if (item == null) {
continue;
}
String type = getStringValue(item, "type", "@type");
if (type == null || type.trim().isEmpty() || "Canvas".equals(type) || "sc:Canvas".equals(type)) {
canvases.add(item);
}
}
}
// Fall back to v2 sequences only if items yielded nothing
if (canvases.isEmpty() && manifest.has("sequences")) {
JSONArray sequences = safeArray(manifest.get("sequences"));
for (int i = 0; i < sequences.size(); i++) {
JSONObject sequence = safeObject(sequences.get(i));
if (sequence == null || !sequence.has("canvases")) {
continue;
}
JSONArray seqCanvases = safeArray(sequence.get("canvases"));
for (int j = 0; j < seqCanvases.size(); j++) {
JSONObject canvas = safeObject(seqCanvases.get(j));
if (canvas != null) {
canvases.add(canvas);
}
}
}
}
return canvases;
}How to Verify:
- Construct or find a IIIF manifest that contains both
sequences(v2 style) anditems(v3 style) with the same canvases. - Import that manifest via
?manifest={URL}. - Verify the created project contains each page exactly once, not duplicated.
Minor Issues 🟡
Issue 1: checkIfFileHasExtension matches without requiring a dot separator
File: src/java/edu/slu/tpen/servlet/ClassicProjectFromManifest.java:476
Category: Logic Error
Problem:
The extension check uses lower.endsWith(entry) without requiring a . before the extension. This means a URL path segment like "mapping" would match "png" since "mapping".endsWith("png") is true. In that case the code would treat it as a direct image URL and skip constructing the proper IIIF Image API URL from the service, resulting in a broken image link.
While the core matching logic is carried over from the original code, this method was substantially rewritten in this PR (null checks, query/fragment stripping, case normalization, expanded extension list), making it a good opportunity to fix.
Current Code:
String[] extn = {"png", "jpg", "jpeg", "gif", "webp", "avif", "apng", "bmp", "svg", "ico", "tif", "tiff", "jp2"};
return Arrays.stream(extn).anyMatch(entry -> lower.endsWith(entry));Suggested Fix:
String[] extn = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".avif", ".apng", ".bmp", ".svg", ".ico", ".tif", ".tiff", ".jp2"};
return Arrays.stream(extn).anyMatch(entry -> lower.endsWith(entry));How to Verify:
- Confirm that URLs ending in known image extensions (e.g.,
image.jpg,photo.PNG,scan.jp2?token=abc) still returntrue. - Confirm that strings like
"mapping","foobmp", or"notajpeg"returnfalse.
Issue 2: System.out.println and out.println used instead of logger
File: src/java/edu/slu/tpen/servlet/ClassicProjectFromManifest.java:139,162,176
Category: Code Hygiene
Problem:
Three lines write to stdout rather than using the LOG logger. In a servlet container (Tomcat), stdout goes to catalina.out without log level, timestamp, or class context — making it harder to filter, rotate, or monitor operationally. The PR modified or introduced two of these three lines (lines 139 and 162), so this is new code using stdout.
Current Code:
out.println("Go over " + canvases.size() + " canvases"); // line 139
System.out.println(folioscreated+" folios created from "+canvasespresent+" canvases"); // line 162
System.out.println("Set "+mss.getFolios().length+" folios for newProject "+projectID); // line 176Suggested Fix:
LOG.log(INFO, "Go over {0} canvases", canvases.size());
LOG.log(INFO, "{0} folios created from {1} canvases", new Object[]{folioscreated, canvasespresent});
LOG.log(INFO, "Set {0} folios for newProject {1}", new Object[]{mss.getFolios().length, projectID});How to Verify:
- Deploy and import a manifest.
- Confirm the messages appear in the application log with proper formatting (timestamp, level, class) rather than as plain lines in
catalina.out.
Suggestions 🔵
Suggestion 1: IIIF Image API size parameter changed from full to !4000,5000
File: src/java/edu/slu/tpen/servlet/ClassicProjectFromManifest.java:348-350
Category: Breaking Change (behavioral)
The constructed IIIF Image API URL changed from full/full/0/default.jpg to full/!4000,5000/0/default.jpg. This constrains requested images to fit within 4000x5000 pixels — a reasonable practical limit that prevents timeouts on extremely large originals. However, this applies to all manifests (v2 and v3), not just v3. Existing v2 manifests that previously produced full-resolution folio images will now produce constrained images.
If this is intentional (likely, given the PR's goal of resilient ingestion), consider documenting the constraint. If full-resolution is needed for certain workflows, a configuration option could be added later.
Suggestion 2: readLabel could prefer the en locale for internationalized labels
File: src/java/edu/slu/tpen/servlet/ClassicProjectFromManifest.java:432
Category: Code Quality
IIIF v3 labels are language maps (e.g., {"en": ["Title"], "fr": ["Titre"]}). The current code checks none first, then iterates keySet() in unspecified order. Since T-PEN is an English-language application, checking for en before falling back to any available key would produce more predictable labels:
// After checking "none", check "en" before iterating all keys
if (labelObj.has("en")) {
JSONArray en = safeArray(labelObj.get("en"));
if (!en.isEmpty()) {
String value = String.valueOf(en.get(0));
if (!value.trim().isEmpty()) {
return value;
}
}
}Suggestion 3: isSupportedManifest is very permissive
File: src/java/edu/slu/tpen/servlet/ClassicProjectFromManifest.java:226
Category: Code Quality
The fallback manifest.has("items") || manifest.has("sequences") would accept any JSON object with those fields, including IIIF Collections (which also have items). While tolerant ingestion is the goal, a Collection imported as a Manifest would silently produce unexpected results — the items would be nested Manifests, not Canvases, and extractCanvases would add them as canvases with no images, resulting in an empty project. A log warning when falling back to structural detection (no type field matched) would help diagnose these cases without rejecting the manifest.
If there are significant code changes in response to this review please test those changes. Run the application manually and test or perform internal application tests when applicable.
…si3-manifests' of https://github.com/CenterForDigitalHumanities/TPEN28 into 720-createprojectfrommanifestmanifest-should-accept-presi3-manifests
Return HTTP 400 when a manifest includes canvases but no importable images were created (aborts processing by returning -1). Also prefer the "en" entry when extracting localized labels before iterating other keys so English labels are chosen when available.
thehabes
left a comment
There was a problem hiding this comment.
Already deployed and tested, and so far these changes are a success. The issues noted in the static review below are non-blocking. We are OK to merge this.
Static Review Comments
Branch: 720-createprojectfrommanifestmanifest-should-accept-presi3-manifests
Review Date: 2026-04-14
Reviewer: Pair Static Review - Claude & @thehabes
Claude and Bryan make mistakes. Verify all issues and suggestions. Avoid unnecessary scope creep.
| Category | Issues Found |
|---|---|
| 🔴 Critical | 0 |
| 🟠 Major | 1 |
| 🟡 Minor | 2 |
| 🔵 Suggestions | 1 |
Critical Issues 🔴
None found.
Major Issues 🟠
Issue 1: isSupportedManifest accepts any JSON with items or sequences — may ingest IIIF Collections or other non-Manifest objects
File: ClassicProjectFromManifest.java:230-237
Category: Logic Error
Problem:
The fallback at line 236 returns true for any JSON object that has an items or sequences key, regardless of type. IIIF Collections also have an items array (containing Manifests, not Canvases). If a user accidentally submits a Collection URL, the code would attempt to iterate Collection members as if they were Canvases, likely producing garbage folios or silent failures.
Current Code:
private static boolean isSupportedManifest(JSONObject manifest) {
String type = getStringValue(manifest, "type", "@type");
if ("Manifest".equals(type) || "sc:Manifest".equals(type)) {
return true;
}
// Be tolerant of real-world manifests with weak or missing type metadata.
return manifest.has("items") || manifest.has("sequences");
}Suggested Fix:
private static boolean isSupportedManifest(JSONObject manifest) {
String type = getStringValue(manifest, "type", "@type");
if ("Manifest".equals(type) || "sc:Manifest".equals(type)) {
return true;
}
// Reject objects that are explicitly typed as something other than a Manifest
if (type != null && !type.trim().isEmpty()) {
return false;
}
// Be tolerant of real-world manifests with weak or missing type metadata.
return manifest.has("items") || manifest.has("sequences");
}This way, a Collection with "type": "Collection" or "@type": "sc:Collection" is correctly rejected, while truly typeless manifests are still accepted.
How to Verify:
- Test with a IIIF Collection URL (e.g., one from the BnF or a university library that serves collection-level manifests).
- Confirm the endpoint returns a clear error rather than creating a broken project.
Minor Issues 🟡
Issue 2: checkIfFileHasExtension matches without a dot delimiter
File: ClassicProjectFromManifest.java:489-490
Category: Logic Error (pre-existing, but worsened by expanded extension list)
Problem:
The extension check uses lower.endsWith(entry) where entry is just the extension without a leading dot. This means a URL path segment like somethingavif or graphic would falsely match "avif" or "ico" respectively. The original code had this same issue, but the expanded extension list (now 13 entries including short ones like ico, bmp, svg) increases the chance of false positives.
Current Code:
String[] extn = {"png", "jpg", "jpeg", "gif", "webp", "avif", "apng", "bmp", "svg", "ico", "tif", "tiff", "jp2"};
return Arrays.stream(extn).anyMatch(entry -> lower.endsWith(entry));Suggested Fix:
String[] extn = {".png", ".jpg", ".jpeg", ".gif", ".webp", ".avif", ".apng", ".bmp", ".svg", ".ico", ".tif", ".tiff", ".jp2"};
return Arrays.stream(extn).anyMatch(entry -> lower.endsWith(entry));How to Verify:
Test with an image URL whose last path segment doesn't contain a dot (e.g., https://example.com/images/myimagejpg) — it should NOT be treated as having a file extension.
Issue 3: readLabel prefers none over en for IIIF v3 language maps
File: ClassicProjectFromManifest.java:427-444
Category: Logic Error
Problem:
In IIIF Presentation API 3.0, the label property is a language map like {"en": ["English Title"], "none": ["Untitled"]}. The none key means "language not specified" and is typically a fallback. The code checks none before en, so a manifest with both keys would use the unlocalized none value instead of the English label. For an English-language application like T-PEN, en should take priority.
Current Code:
if (labelObj.has("none")) {
JSONArray none = safeArray(labelObj.get("none"));
// ... returns none value
}
// After checking "none", check "en" before iterating all keys
if (labelObj.has("en")) {
JSONArray en = safeArray(labelObj.get("en"));
// ... returns en value
}Suggested Fix:
Swap the order — check en first, then none:
if (labelObj.has("en")) {
JSONArray en = safeArray(labelObj.get("en"));
if (!en.isEmpty()) {
String value = String.valueOf(en.get(0));
if (!value.trim().isEmpty()) {
return value;
}
}
}
if (labelObj.has("none")) {
JSONArray none = safeArray(labelObj.get("none"));
if (!none.isEmpty()) {
String value = String.valueOf(none.get(0));
if (!value.trim().isEmpty()) {
return value;
}
}
}How to Verify:
Import a v3 manifest that has both en and none label values (common in Europeana, BnF manifests). Check the resulting project/folio names use the English label.
Suggestions 🔵
Suggestion 1: readLabel iterates all keys redundantly after explicit none/en checks
File: ClassicProjectFromManifest.java:446-455
After checking none and en explicitly, the for (Object keyObj : labelObj.keySet()) loop iterates all keys including none and en again. It's not a bug (the method already returned if those keys matched), but you could skip them in the loop for clarity:
for (Object keyObj : labelObj.keySet()) {
String key = String.valueOf(keyObj);
if ("en".equals(key) || "none".equals(key) || "@value".equals(key)) {
continue; // already checked above
}
// ... rest of loop
}If there are significant code changes in response to this review please test those changes. Run the application manually and test or perform internal application tests when applicable.
fixes #720
Add tolerant IIIF manifest handling and helper methods to import canvases/images from both Presentation API 2.x and 3.x. Introduces isSupportedManifest, extractCanvases, extractImageUrlsFromCanvas, resolveImageUrl, getServiceId, safeArray/safeObject, getStringValue and readLabel to normalize IDs, labels and image service resolution. Improve fallback for manifest ID and project naming, skip canvases without importable images, and strengthen file-extension detection (handles queries/fragments and more image types using Locale). Overall makes manifest ingestion more resilient and compatible with real-world IIIF manifests.