Skip to content

Skip metadata assignment for untyped divisions#6931

Draft
BartChris wants to merge 7 commits into
kitodo:mainfrom
BartChris:untyped_division_metadata
Draft

Skip metadata assignment for untyped divisions#6931
BartChris wants to merge 7 commits into
kitodo:mainfrom
BartChris:untyped_division_metadata

Conversation

@BartChris
Copy link
Copy Markdown
Collaborator

@BartChris BartChris commented Mar 17, 2026

This Pull request addresses the issue that Kitodo right now often injects unwanted metadata into the METS file. The issue has been described in different places, e.g.

#4362 (comment)
or
#6024 (comment)

The problem is, that right now it is not mandatory to define special rules in the ruleset which prevent the uncontrolled insertion of e.g. the processTitle:

<restriction division="" unspecified="forbidden">
    <permit division="page"/>
    <permit division="track"/>
    <permit division="other"/>
</restriction>

<restriction division="page" unspecified="forbidden">
     <permit key="ORDERLABEL" maxOccurs="1"/>
</restriction>

If the institution does not have those rules in place, page elements or unspecified (untyped) elements which are created for newspaper issues might get unwanted metadata injections.
My fix therefor does two things:

  • If the type of a logical section is undefined, do not inject metadata there. (And in consequence inject unwanted DMDSecs)
  • Do not inject unwanted metadata into pysical pages

If both things are wanted it has to be implemented in a safer way. We cannot guarantee that all institutions know about those settings and have them in place. The behavior should therefor by default prevent unwanted insertions.

@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 2, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 8 complexity · 0 duplication

Metric Results
Complexity 8
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@solth solth self-requested a review April 9, 2026 07:19
@solth solth added the ruleset ruleset functions and configuration label Apr 13, 2026
@BartChris BartChris force-pushed the untyped_division_metadata branch 2 times, most recently from a13e01a to f7786eb Compare April 14, 2026 16:29
@BartChris
Copy link
Copy Markdown
Collaborator Author

This Pull request addresses the issue that Kitodo right now often injects unwanted metadata into the METS file. The issue has been described in different places, e.g.

It might be that this only fixes parts of the issue. As outlined by @michaelkubina in #4362 (comment) Kitodo also seems to inject default values (presets) into the untyped metadata divisions. It would have to be traced where this happens exactly (process creation? Save in Metadata editor?).

@BartChris BartChris marked this pull request as draft April 15, 2026 08:41
@BartChris BartChris force-pushed the untyped_division_metadata branch from a597d31 to 6c0e48b Compare April 15, 2026 09:51
public void preserve() throws InvalidMetadataValueException, NoSuchMetadataFieldException {
try {
if (isDivisionUntyped()) {
logger.warn("Skipping metadata preservation for untyped division.");
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How helpful is this warning message? There are no context information nor an administrator or any user with access to the log files can do here anything or inform anyone as context information are missing. Maybe the user in the UI should be informed about this case.

Copy link
Copy Markdown
Collaborator Author

@BartChris BartChris Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right; i am trying to adress a behaviour which should be fixed by the application, but logging it seems overkill. I think "preserve" is called way to often so that the user will also be annoyed by those messages.

@BartChris
Copy link
Copy Markdown
Collaborator Author

BartChris commented Apr 15, 2026

After having discussed the problems with @michaelkubina the unwanted enrichments happen, when an instiution has defined metadata keys as "always showing" and with a default preset, e.g:

 <key id="docType">
            <label>Document Type</label>
            <label lang="de">Dokumenttyp</label>
            <option value="monograph">
                <label>Monograph</label>
                <label lang="de">Monographie</label>
            </option>
            <option value="multivolume_work">
                <label>Multivolume Work</label>
                <label lang="de">Mehrbändiges Werk</label>
            </option>
            <preset>monograph</preset>
        </key>

and:

<setting key="docType" editable="true" alwaysShowing="true"/>

When you have those keys defined and select in the metadataeditor view of the issue one of the divisions without a type, those default values are injected in the UI:

image

Those UI value are then preserved in the meta.xml upon save. To quote @michaelkubina:

Flow wenn man einen Vorgang öffnet:
"Ausgabe" ist angewählt -> Speichern + Schließen -> alles Okay
"Ausgabe" ist angewählt -> man klickt auf die erste "Aufteilung ohne Typ" und geht zurück auf Ausgabe -> Speichern + Schließen -> DMD-Sec für die Aufteilung ohne Typ wurde angelegt
"Ausgabe" ist angewählt -> man klickt auf die zweite "Aufteilung ohne Type" und geht zurück auf Ausgabe -> Speichern + Schließen -> DMD-Sec für die zweite Aufteilung ohne Typ wurde angelegt.

When this setting is defined in the ruleset:

<restriction division="" unspecified="forbidden">
    <permit division="page"/>, 
    <permit division="track"/>
    <permit division="other"/>
</restriction>

this does not happen, because the fields in questions are not actually rendered in the UI. My fix does not prevent them from appearing in the UI, but prevents that those values are serialized in the meta.xml as untyped divisions are skipped when calling preserve. (Which maps the UI data to the Kitodo data model, which then gets serialized into the XML).

This PR therefor introduces a safety net for institutions which do not have the necessary ruleset rules defined.
Another step might be to block the metadata display in the UI for those untyped divisions so that they are not part of the serialized data.
The long term solution would probably be to get rid of untyped divisions in Kitodo.

@BartChris BartChris marked this pull request as ready for review April 15, 2026 13:27
@BartChris BartChris force-pushed the untyped_division_metadata branch from e9de62a to 4f64566 Compare April 15, 2026 13:30
@BartChris
Copy link
Copy Markdown
Collaborator Author

My commit 1f0d82a goes one step further. If the division is untyped we should not inject metadata in the UI layer. Untyped divisions do not show any data now, which might get serialized into the meta.xml:

image

@BartChris
Copy link
Copy Markdown
Collaborator Author

BartChris commented Apr 15, 2026

I am not exactly sure if i introduce a behaviour change here. It might be that people actually assign metadata to the untyped divisions. This is of course only possible if the metadata is actually rendered in the UI and preserved on Save. My fix basically enforces that the untyped divisions do not get any metadata assigned.

Maybe @andre-hohmann can comment here.

@BartChris
Copy link
Copy Markdown
Collaborator Author

BartChris commented Apr 15, 2026

My fix basically enforces that the untyped divisions do not get any metadata assigned.

This does of course not exclude the option to enrich metadata here on export or via XSL. See: #4362 (comment)

@BartChris BartChris force-pushed the untyped_division_metadata branch from 1f0d82a to e96358e Compare April 15, 2026 14:07
@andre-hohmann
Copy link
Copy Markdown
Collaborator

I hope I understood correctly that "untyped divisions do not get any metadata assigned" refers to both:

  1. Manual assignment via the metadata editor, and
  2. Automatic assignment during the creation process using the calendar.

Regarding 1 (Manual):
At SLUB, we correct issues, which are assigned to wrong dates. In these cases, the ORDERLABEL of the second untyped division is corrected, too.

Regarding 2 (Automatic):
This is required to generate ORDERLABEL for the day, in the <mets:structMap TYPE="LOGICAL"> for the issue.
This is then used to derive the ORDERLABEL for the month by XSLT as suggested by @stefanCCS :

Wouldn't this be solved, if the container levels would be eliminated?: #4362 (comment)
As written in the comment, it would be necessary to generate the values for the month and day level in the issue from the information in the year process.

@BartChris
Copy link
Copy Markdown
Collaborator Author

Wouldn't this be solved, if the container levels would be eliminated?: #4362 (comment)

I agree, but i am not really sure what this means exactly and what the consequences would be. But thanks a lot for your comment. Given what you say my changes are probably to radical as i think it would break your existing workflows. On the other hand the current behaviour of metadata injections can have really destructive consequences. I have to think more about this.

@BartChris BartChris marked this pull request as draft April 15, 2026 14:31
@BartChris BartChris force-pushed the untyped_division_metadata branch from 9ef678e to fdc4db9 Compare April 15, 2026 15:40
BartChris added 5 commits May 18, 2026 17:33
Prevent metadata from being written to structural container nodes
of type page without a TYPE. Recursion is preserved, but only semantic divisions
receive metadata. Fixes unintended DMDSEC creation.
@BartChris BartChris force-pushed the untyped_division_metadata branch 2 times, most recently from 2569d89 to aecd7f9 Compare May 18, 2026 15:36
@BartChris BartChris force-pushed the untyped_division_metadata branch 3 times, most recently from e26b480 to 84dfb9d Compare May 18, 2026 15:52
@BartChris BartChris force-pushed the untyped_division_metadata branch from 84dfb9d to f10a525 Compare May 18, 2026 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ruleset ruleset functions and configuration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants