Skip to content

MSC4456: Harms taxonomy#4456

Open
turt2live wants to merge 3 commits into
mainfrom
travis/msc/harms-appendix
Open

MSC4456: Harms taxonomy#4456
turt2live wants to merge 3 commits into
mainfrom
travis/msc/harms-appendix

Conversation

@turt2live
Copy link
Copy Markdown
Member

@turt2live turt2live commented Apr 28, 2026

Warning

Content Warning: This proposal discuses and identifies harmful content, but does not attempt
to describe the harm posed in detail. This includes identifiers for child safety, sexual abuse,
self-harm, and other types of harm a user may encounter on the open internet.


Rendered

This proposal was split out from MSC4387.

@turt2live turt2live changed the title MSC: Harms taxonomy MSC4456: Harms taxonomy Apr 28, 2026
@turt2live turt2live marked this pull request as ready for review April 28, 2026 19:08
@turt2live turt2live added proposal A matrix spec change proposal meta Something that is not a spec change/request and is not related to the build tools kind:core MSC which is critical to the protocol's success needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. reporting-v2 labels Apr 28, 2026
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation requirements:

  • 2+ MSCs in different areas using this. For example, search redirection and reporting.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically met by the following two MSCs, but I'd like to see each further along in the spec process before considering them true implementations of this MSC:

Comment thread proposals/4456-safety-harms-appendix.md
Comment on lines +43 to +46
* `m.spam.fraud` - Fraud/Phishing
* `m.spam.impersonation` - Impersonation
* `m.spam.election_interference` - Election Interference
* `m.spam.flooding` - Flooding
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be worthwhile to have a separate m.misinformation category, especially to include other forms of synthetic media/deepfakes besides m.adult.deepfake (not all deepfakes are inherently sexual, so there could be e.g. m.misinformation.deepfake). "Election interference" feels like it wouldn't always be a subcategory of spam.

Having them be independent also means you can classify m.misinformation.fraud alongside m.spam where they come in a list.

Additionally, if the UX was designed to align with these categories, it wouldn't make sense for a user to view these under "spam," IMHO.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at Bluesky (where the list is inspired in part from), they label "spam" as "Misleading - spam or other inauthentic behaviour or deception". We might want to adopt similar labeling for the "Spam" category we have here.

They also consider deepfakes to be primarily adult content. I suspect that if a user was reporting a deepfake that wasn't easily classified as adult content then they'd use "impersonation" or "other misleading content" (to use the Bluesky terms).

@@ -0,0 +1,145 @@
# MSC4456: Harms taxonomy
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very dubious that we should be baking a taxonomy like this into the Matrix spec, because:

  1. The spec is already huge
  2. Harms can be very subjective and will encourage bikeshedding or bloat. e.g. where is Lese-majesty on the list?
  3. In practice we'll always need an 'other' fallback with a natural language explanation anyway - why not use natural language all along?
  4. Why do we care about semantic codes here at all?
  5. I suspect that we're going to see more and more LLM-based moderation functionality in future, which will be quite happy to process natural language reasons rather than trying to create a set of reason enumerations

At the least, i'd expect the reasons to sit in an external registry somewhere to avoid bloating the spec.

Copy link
Copy Markdown

@thetayloredman thetayloredman May 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be fair, many online reporting platforms have a flow for specifically selecting harms information which I believe to be the point of the MSC:
image

It feels reasonable to want to put a list of some common harms in to make the interface better and potentially aid in tooling without the intrinsic requirement for LLMs :D

I do agree that it might be better outside of the spec, but the question for me is where does this get defined, because it feels very necessary.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The spec is already huge

Building a better way to communicate necessarily involves a bunch of detail!

  1. Harms can be very subjective and will encourage bikeshedding or bloat.

That is a risk. Defining a baseline taxonomy rather than a comprehensive one may help to mitigate that risk.

  1. In practice we'll always need an 'other' fallback with a natural language explanation anyway - why not use natural language all along?
  2. Why do we care about semantic codes here at all?

With semantic codes, clients can more easily build better reporting flows, offering tailored advice based on the type of harm the user has experienced (e.g. direction to helplines, law enforcement, how to keep themselves safe). Servers and communities can use the codes to communicate why enforcement action was taken against a user or piece of content (a requirement in many safety laws). Safety teams can use user-provided codes to triage reports more effectively, both with human teams, and by routing to the most cost/time-effective automated flows. When all servers in a Matrix federation share a common taxonomy of harms, it simplifies sharing details of those harms over federation.

  1. I suspect that we're going to see more and more LLM-based moderation functionality in future, which will be quite happy to process natural language reasons rather than trying to create a set of reason enumerations

Using semantic reasons enables the use of more cost-effective & faster single purpose models rather than slower, more expensive general models, and enables routing to appropriate humans for review.

At the least, i'd expect the reasons to sit in an external registry somewhere to avoid bloating the spec.

An alternative: reference an external standard, as we do with RFCs elsewhere in the spec. Unfortunately, there doesn't appear to be a suitable standard that provides this taxonomy at present, but this could be something to explore and then replace this proposal down the line. The DTSP framework (ISO/IEC 25389) is an example of nascent work in safety standardisation, that doesn't do what we need here. The spec could also offer appendices for this type of content, if there are concerns? I think the spec should contain appropriate guidance to building safe servers and clients, so I'd be comfortable with us including it in the body of the spec.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The closest I can find for an existing reference are AT Proto's com.atproto.moderation.defs and tools.ozone.report.defs models. Obviously, these definitions are highly targeted at AT Proto's use cases, but the parallels in this MSC should be fairly evident as well :)

We may benefit from just copying AT Proto's definitions directly, or working with them to create an external standard that works for both of us. This MSC currently suggests we do something similar to what AT Proto did: create an appendix/definition that exists within their world and refer to it as needed.

The next closest I can find for an existing reference is the European Commission's Transparency Database API which describes content in two ways: a Category and a Category Specification. By nature of it being backed by the Digital Services Act (DSA), it's highly focused on that particular regulatory environment - it does not easily apply to other environments such as the UK, US, Australia, or Canada (despite these places copying most of each other's work in law creation).

Related work is from the World Economic Forum (WEF) which attempts to describe harms in ways that users can understand, but is hardly a "harm identifier" list. Their report can be found here.

The Trust & Safety Professional Association (TSPA) attempts to list the types of abuse, but also doesn't define machine-friendly identifiers for those abuse types. It may be possible to ask them to create a machine-friendly taxonomy for their list, though I expect it'll be too broad for our purposes in Matrix.

IFTAS is primarily used by ActivityPub, but is not a standards organization. They do however provide definitions for 3 types of harmful content (and how to deal with it): by actor, behaviour, or content. Like TSPA, we might be able to ask them to consider a machine-friendly specification for these types of harms. Being associated with ActivityPub might make them more applicable to Matrix too.

If we really don't want to host the list as Matrix, we can probably look to the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG) to establish a set of identifiers. The DPVCG might not take on the work because not all harms are privacy related.

OASIS might be able to help create a standard external to Matrix as well, though their primary output locations are the ISO and IEC. It may be faster/easier/different to go through a local national body instead, like the Standards Council of Canada (SCC). The DTSP Safe Framework Specification is hosted as ISO/IEC 25389 (as Jim mentions), so it's plausible that we could get a similar harms taxonomy specification there too.

DTSP might also be able to help create a standard to reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:core MSC which is critical to the protocol's success meta Something that is not a spec change/request and is not related to the build tools needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. proposal A matrix spec change proposal reporting-v2 safety

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants