Skip to content

Add audio transcript and video description features for a11y#5393

Open
AndreaBarbasso wants to merge 1 commit intoDSpace:mainfrom
4Science:task/main/CST-22298_squashed
Open

Add audio transcript and video description features for a11y#5393
AndreaBarbasso wants to merge 1 commit intoDSpace:mainfrom
4Science:task/main/CST-22298_squashed

Conversation

@AndreaBarbasso
Copy link
Copy Markdown
Contributor

@AndreaBarbasso AndreaBarbasso commented Apr 3, 2026

References

Description

Added new bitstream metadata in order to include an audio transcript and/or a video description to an audio/video bitstream. Managed such metadata in submission form and bitstream edit form. Introduced modals to show those values in the item page.

Instructions for Reviewers

List of changes in this PR:

  • Changed the bitstream submission form to accommodate the newly created metadata fields;
  • Applied a custom logic to show/hide fields relevant to the selected media type;
  • Extended the current bitstream edit form to - again - accommodate these new metadata fields;
  • Introduced clickable links to show audio transcripts and video descriptions in a modal above the item page.

Include guidance for how to test or review your PR.

  • Create a new Item and upload a bitstream. Try to change its media type - an audio media should show the audio transcript textarea, a video media should show the video description textarea, and an audio and video media should show both;
  • Confirm that these values are saved correctly as bitstream metadata;
  • Confirm that changing the media type does not delete metadata that is not relevant anymore (e.g. selecting the video media type should not clear the audio transcript metadata value. This is expected - since these values can be quite long and tedious to write, changing the media type by accident should not clear them;
  • After depositing the item, check that its bitstream metadata can be changed. The bitstream edit form should behave the same as the submission form;
  • Check that the item page shows links relevant to the bitstream type. An audio bitstream with an audio transcript should show the transcript link, etc;
  • Check that clicking on those links open a modal with the metadata value shown inside.
image image image

Checklist

  • My PR is created against the main branch of code (unless it is a backport or is fixing an issue specific to an older branch).
  • My PR is small in size (e.g. less than 1,000 lines of code, not including comments & specs/tests), or I have provided reasons as to why that's not possible.
  • My PR passes ESLint validation using npm run lint
  • My PR doesn't introduce circular dependencies (verified via npm run check-circ-deps)
  • My PR includes TypeDoc comments for all new (or modified) public methods and classes. It also includes TypeDoc for large or complex private methods.
  • My PR passes all specs/tests and includes new/updated specs or tests based on the Code Testing Guide.
  • My PR aligns with Accessibility guidelines if it makes changes to the user interface.
  • My PR uses i18n (internationalization) keys instead of hardcoded English text, to allow for translations.
  • My PR includes details on how to test it. I've provided clear instructions to reviewers on how to successfully test this fix or feature.
  • If my PR includes new libraries/dependencies (in package.json), I've made sure their licenses align with the DSpace BSD License based on the Licensing of Contributions documentation.
  • If my PR includes new features or configurations, I've provided basic technical documentation in the PR itself.
  • If my PR fixes an issue ticket, I've linked them together.

@lgeggleston lgeggleston added bug accessibility component: submission funded Task is funded via the DSpace Development Fund labels Apr 3, 2026
@lgeggleston lgeggleston moved this to 🙋 Needs Reviewers Assigned in DSpace 10.0 Release Apr 3, 2026
@AndreaBarbasso AndreaBarbasso marked this pull request as ready for review April 3, 2026 13:16
Copy link
Copy Markdown
Member

@tdonohue tdonohue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreaBarbasso : I'm starting to review this work, and I'm a little confused about the implementation that you've created, as it doesn't appear to align with the tickets #3807 and #3808:

  • In #3807, we've noted that WCAG requires either an audio/video description/transcript or a full text alternative.
  • In #3808 we also talk about allowing an upload of full text alternative content and labeling it appropriately

This PR appears to only add a way to create a description/transcript. It doesn't fix the issues with uploading alternative content & relating it back to the original file.

I was expecting to have two a11y features implemented:

  1. An option to add a description/transcript for the Bitstream (ideally in a dc.description metadata field, as that already exists).
  2. An option to upload a secondary file which provides a textual alternative to audio/video content.

This PR currently only implements the description/transcript feature. It doesn't allow for alternative content. So, it appears to be a partial fix for #3807 and doesn't solve #3808 at all.

My opinion is that we can simplify this PR's functionality to just have a single Description metadata field which can capture either a transcript (for audio) or a description (for video). I'm not sure that we need two separate metadata fields (description and transcript), especially if we are going to implement the alternative content upload option.

I added more feedback to the backend PR as I feel we should use existing metadata fields as much as possible.

@github-project-automation github-project-automation bot moved this from 🙋 Needs Reviewers Assigned to 👀 Under Review in DSpace 10.0 Release Apr 13, 2026
@AndreaBarbasso
Copy link
Copy Markdown
Contributor Author

@tdonohue, a few comments inline:

This PR appears to only add a way to create a description/transcript. It doesn't fix the issues with uploading alternative content & relating it back to the original file.

I thought the issue regarding the upload of alternative content was set to be done in a follow-up issue, as said here. We can try to add the alternative content upload to these PRs, but I don't think it's a trivial development.

My opinion is that we can simplify this PR's functionality to just have a single Description metadata field which can capture either a transcript (for audio) or a description (for video). I'm not sure that we need two separate metadata fields (description and transcript), especially if we are going to implement the alternative content upload option.

Looking at this comment, I understood that two different fields can be useful for audio+video files, since the audio transcript can be used by users with hearing impairment, and the video description can be used by users with sight impairment. I think we need a clear direction on how to move forward from here.

Last but not least, I agree with the change of metadata keys you proposed.

@tdonohue
Copy link
Copy Markdown
Member

tdonohue commented Apr 14, 2026

@AndreaBarbasso : My apologies, it looks like I misunderstood the goal of this PR because I had assumed it was supposed to directly match the tickets it's fixing. I had forgotten about those follow-up discussions where we narrowed down the task (and it looks like we never updated the ticket descriptions -- which is what I was looking at).

That all said, I think my suggestions for better metadata fields to use are still valid (as you noted).

Overall, we can keep this PR mostly "as-is", but I'd recommend the following updates:

  • Correct the metadata fields used as suggested.
    • If possible, I'd prefer we represent audio & video as two dc.type values of audio and video. But, based on the current UI, that may be too complex. So, I'm OK if we just create a third value of audio+video (instead of audiovideo)
  • We can retain the two description fields as you currently have (but update the metadata field they are using). Obviously only one will be displayed if you select just audio or video, but both can be displayed if you select both.

Sorry to confuse things. This all essentially means that I'm OK with your current approach because it aligns with prior discussions from last year. We just need to cleanup the metadata fields used.

@AndreaBarbasso
Copy link
Copy Markdown
Contributor Author

@tdonohue, thanks for the update!

Since we are going to retain the two description fields, which metadata should them be saved on? I guess the audio transcript can be saved on dc.description.transcript, but I'm not sure about saving the video description on dc.description. I guess we might have some video files with both a "simple" description (e.g. "A video of a ballerina, dancing to the notes of The Blue Danube") and a complete video description. Shouldn't we use an ad-hoc metadata field for the video description?

Moreover, we already have the dc.description metadata in the Bistream form, and it suggest a "brief description of the file".
image

@tdonohue
Copy link
Copy Markdown
Member

@AndreaBarbasso : Good point about the overlaps with dc.description.

Ok, maybe instead of trying to shove these into Dublin Core, we should take a step back and just consider using our internal dspace metadata schema. We already have a few dspace.bitstream.* metadata fields, so we could just create two more:

  • dspace.bitstream.transcript - Transcript of the bitstream (for audio/video content).
  • dspace.bitstream.textalternative - Detailed description of contents of bitstream which provides a text-based alternative (especially for audio/video/image content).

That might be the easiest here to ensure we aren't overusing dc.description. We can keep that field for a brief description of a given file.

Does that seem reasonable?

@AndreaBarbasso
Copy link
Copy Markdown
Contributor Author

@tdonohue, I think that the proposed metadata fields do not reflect the logic that has been implemented: showing either one field (audio transcript for audio files, video description for video files), or both of them if the media contains audio and video data. Moreover, the text alternative content is still set to be done on the follow-up issue.

I'd suggest:

  • dspace.bitstream.transcript or dspace.bitstream.audiotranscript for audio transcripts;
  • dspace.bitstream.description or dspace.bitstream.videodescription for video descriptions.

Let me know if this sound good to you!

@mwoodiupui
Copy link
Copy Markdown
Member

Has anyone talked to people with expertise in cataloging enriched audiovisual materials? Are there already established, or at least publicly proposed, metadata standards here?

@tdonohue
Copy link
Copy Markdown
Member

tdonohue commented Apr 16, 2026

@AndreaBarbasso : I'm a bit confused by your last comment because I understand ticket #4438 differently.

To quote that PR description:

In #3807, we'll have a way to prompt users for alternative media text / descriptions in the submission form, and store those values in metadata fields like dc.description.transcript, dc.description.audiodescription or dc.description.textalternative.

This first part is saying that the fix for #3807 should store alternative media text / descriptions in metadata fields similar to dc.description.transcript, dc.description.audiodescription or dc.description.textalternative. (These are metadata field names we came up with in our Developer Meeting, if I recall correctly)

This is where I came up with the fields named: dspace.bitstream.transcript and dspace.bitstream.textalternative. I borrowed these qualifiers from this ticket's description.

Later that ticket says

However, we've realized that a future improvement would be to also prompt the user to upload alternative files, especially transcript and caption files:

This is saying that the future work is to provide a way to upload alternative files instead of using metadata fields. That's the work that is new in #4438.

So, in my opinion, the metadata fields that I suggested are still appropriate. We need to achieve two different use cases:

  1. We need a metadata field which can store a smaller sized transcript, especially for audio files. We agree this can be dspace.bitstream.transcript
  2. We need a metadata field that can store a more detailed description (or "alternative text") for video files. In this case, I like the field dspace.bitstream.textalternative because it could be used both for video files (to store this detailed description) as well as eventually for image files (to store "alt" text). In reality the "video description" is just a more detailed form of "alternative text" alongside a video.
    • I feel naming this field dspace.bitstream.description is not as good as it's unclear what the "description" means. I feel text alternative is more descriptive because it's clear this text provides a visual alternative to watching the video, in the same way that "alt" text for images does.
    • I feel the name dspace.bitstream.videodescription could work. But it makes this field very specific to video files. I was hoping to find a more generic name that could be used for the "video description" along with other similar descriptions for visual/audio files. That's why I prefer "textalternative" as it's textual alternative to the visual/audio information.

@tdonohue
Copy link
Copy Markdown
Member

@mwoodiupui :

Has anyone talked to people with expertise in cataloging enriched audiovisual materials? Are there already established, or at least publicly proposed, metadata standards here?

I don't have access to anyone with this expertise, but I did search around for established best practices for storing these descriptions in Dublin Core. Unfortunately, all I found was that most are using dc.description (a very generic field) to store this info. That's not something we can use because we already use dc.description on Bitstreams for other basic descriptions (e.g. "Presentation accompanying paper" or similar)

That said, if you have other resources you can ask, I'd welcome more feedback.

@AndreaBarbasso
Copy link
Copy Markdown
Contributor Author

We need a metadata field that can store a more detailed description (or "alternative text") for video files. In this case, I like the field dspace.bitstream.textalternative because it could be used both for video files (to store this detailed description) as well as eventually for image files (to store "alt" text). In reality the "video description" is just a more detailed form of "alternative text" alongside a video.

@tdonohue, I now see what you mean and agree with you. I'll just wait a bit to see if we can get any feedback from someone with more expertise on metadata standards regarding a11y on media files, as @mwoodiupui suggested, if that's ok!

@mwoodiupui
Copy link
Copy Markdown
Member

A quick search for "metadata namespaces for av" turned up a number of leads. It wouldn't have occurred to me to look for something this medium-specific in Dublin Core. DCTerms does have some fields that might serve for linking main-content and alternate-content bitstreams.

I've asked a couple of our local librarians. The only suggestion so far is "upload more than one file per record and describe them accordingly."

@mwoodiupui
Copy link
Copy Markdown
Member

I tend to be very careful about assigning something to the dspace namespace because it is internal. To interpret it, an external agent would have to know what DSpace means by using that field (if it is even able to see it). A well-known standard's namespace doesn't present that problem.

@tdonohue
Copy link
Copy Markdown
Member

I've asked a couple of our local librarians. The only suggestion so far is "upload more than one file per record and describe them accordingly."

@mwoodiupui : We do plan to eventually allow upload of alternative bitstreams. However, that work has been delayed for a different ticket, see #4438. This current work (in this PR) is just a small step in that direction, where we are going to store alternative text in Bitstream metadata. But, as discussed in a past Developer Meeting (see also #3807 (comment)), we all agree that has limited use (as often transcripts or similar are better stored in a Bitstream as they are larger).

So, this is all to say, it sounds like the librarians you talked to are agreeing with our general, long-term direction. But, this PR is just an initial step to store smaller alternative content in metadata in order to better align with WCAG 1.2.3 at a very basic level, until we have time to build an approach that links together related Bitstreams.

In my opinion, the dspace namespace is best used for data that DSpace needs to store which doesn't "fit" well into standard-based metadata schemas like Dublin Core, QDC, or Schema.org. As far as I can tell, none of these standard metadata schemas have recommendations for storage of accessibility-related textual content. So, I feel this is a scenario where we have to create our own place to store this in DSpace, which is why I recommended the dspace schema. The only other option I see is to create non-standard qualifiers on dc.description, but I'm not personally a fan of continuing to invent non-standard qualifiers for QDC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accessibility bug component: submission funded Task is funded via the DSpace Development Fund

Projects

Status: 👀 Under Review

Development

Successfully merging this pull request may close these issues.

[Accessibility] Providing labels for alternative content [Accessibility] Media alternative for time-based content

4 participants