MSC4456: Harms taxonomy#4456
Conversation
There was a problem hiding this comment.
Implementation requirements:
- 2+ MSCs in different areas using this. For example, search redirection and reporting.
There was a problem hiding this comment.
Technically met by the following two MSCs, but I'd like to see each further along in the spec process before considering them true implementations of this MSC:
| * `m.spam.fraud` - Fraud/Phishing | ||
| * `m.spam.impersonation` - Impersonation | ||
| * `m.spam.election_interference` - Election Interference | ||
| * `m.spam.flooding` - Flooding |
There was a problem hiding this comment.
It may be worthwhile to have a separate m.misinformation category, especially to include other forms of synthetic media/deepfakes besides m.adult.deepfake (not all deepfakes are inherently sexual, so there could be e.g. m.misinformation.deepfake). "Election interference" feels like it wouldn't always be a subcategory of spam.
Having them be independent also means you can classify m.misinformation.fraud alongside m.spam where they come in a list.
Additionally, if the UX was designed to align with these categories, it wouldn't make sense for a user to view these under "spam," IMHO.
There was a problem hiding this comment.
Looking at Bluesky (where the list is inspired in part from), they label "spam" as "Misleading - spam or other inauthentic behaviour or deception". We might want to adopt similar labeling for the "Spam" category we have here.
They also consider deepfakes to be primarily adult content. I suspect that if a user was reporting a deepfake that wasn't easily classified as adult content then they'd use "impersonation" or "other misleading content" (to use the Bluesky terms).
| @@ -0,0 +1,145 @@ | |||
| # MSC4456: Harms taxonomy | |||
There was a problem hiding this comment.
I am very dubious that we should be baking a taxonomy like this into the Matrix spec, because:
- The spec is already huge
- Harms can be very subjective and will encourage bikeshedding or bloat. e.g. where is Lese-majesty on the list?
- In practice we'll always need an 'other' fallback with a natural language explanation anyway - why not use natural language all along?
- Why do we care about semantic codes here at all?
- I suspect that we're going to see more and more LLM-based moderation functionality in future, which will be quite happy to process natural language reasons rather than trying to create a set of reason enumerations
At the least, i'd expect the reasons to sit in an external registry somewhere to avoid bloating the spec.
There was a problem hiding this comment.
To be fair, many online reporting platforms have a flow for specifically selecting harms information which I believe to be the point of the MSC:

It feels reasonable to want to put a list of some common harms in to make the interface better and potentially aid in tooling without the intrinsic requirement for LLMs :D
I do agree that it might be better outside of the spec, but the question for me is where does this get defined, because it feels very necessary.
There was a problem hiding this comment.
- The spec is already huge
Building a better way to communicate necessarily involves a bunch of detail!
- Harms can be very subjective and will encourage bikeshedding or bloat.
That is a risk. Defining a baseline taxonomy rather than a comprehensive one may help to mitigate that risk.
- In practice we'll always need an 'other' fallback with a natural language explanation anyway - why not use natural language all along?
- Why do we care about semantic codes here at all?
With semantic codes, clients can more easily build better reporting flows, offering tailored advice based on the type of harm the user has experienced (e.g. direction to helplines, law enforcement, how to keep themselves safe). Servers and communities can use the codes to communicate why enforcement action was taken against a user or piece of content (a requirement in many safety laws). Safety teams can use user-provided codes to triage reports more effectively, both with human teams, and by routing to the most cost/time-effective automated flows. When all servers in a Matrix federation share a common taxonomy of harms, it simplifies sharing details of those harms over federation.
- I suspect that we're going to see more and more LLM-based moderation functionality in future, which will be quite happy to process natural language reasons rather than trying to create a set of reason enumerations
Using semantic reasons enables the use of more cost-effective & faster single purpose models rather than slower, more expensive general models, and enables routing to appropriate humans for review.
At the least, i'd expect the reasons to sit in an external registry somewhere to avoid bloating the spec.
An alternative: reference an external standard, as we do with RFCs elsewhere in the spec. Unfortunately, there doesn't appear to be a suitable standard that provides this taxonomy at present, but this could be something to explore and then replace this proposal down the line. The DTSP framework (ISO/IEC 25389) is an example of nascent work in safety standardisation, that doesn't do what we need here. The spec could also offer appendices for this type of content, if there are concerns? I think the spec should contain appropriate guidance to building safe servers and clients, so I'd be comfortable with us including it in the body of the spec.
There was a problem hiding this comment.
The closest I can find for an existing reference are AT Proto's com.atproto.moderation.defs and tools.ozone.report.defs models. Obviously, these definitions are highly targeted at AT Proto's use cases, but the parallels in this MSC should be fairly evident as well :)
We may benefit from just copying AT Proto's definitions directly, or working with them to create an external standard that works for both of us. This MSC currently suggests we do something similar to what AT Proto did: create an appendix/definition that exists within their world and refer to it as needed.
The next closest I can find for an existing reference is the European Commission's Transparency Database API which describes content in two ways: a Category and a Category Specification. By nature of it being backed by the Digital Services Act (DSA), it's highly focused on that particular regulatory environment - it does not easily apply to other environments such as the UK, US, Australia, or Canada (despite these places copying most of each other's work in law creation).
Related work is from the World Economic Forum (WEF) which attempts to describe harms in ways that users can understand, but is hardly a "harm identifier" list. Their report can be found here.
The Trust & Safety Professional Association (TSPA) attempts to list the types of abuse, but also doesn't define machine-friendly identifiers for those abuse types. It may be possible to ask them to create a machine-friendly taxonomy for their list, though I expect it'll be too broad for our purposes in Matrix.
IFTAS is primarily used by ActivityPub, but is not a standards organization. They do however provide definitions for 3 types of harmful content (and how to deal with it): by actor, behaviour, or content. Like TSPA, we might be able to ask them to consider a machine-friendly specification for these types of harms. Being associated with ActivityPub might make them more applicable to Matrix too.
If we really don't want to host the list as Matrix, we can probably look to the W3C Data Privacy Vocabularies and Controls Community Group (DPVCG) to establish a set of identifiers. The DPVCG might not take on the work because not all harms are privacy related.
OASIS might be able to help create a standard external to Matrix as well, though their primary output locations are the ISO and IEC. It may be faster/easier/different to go through a local national body instead, like the Standards Council of Canada (SCC). The DTSP Safe Framework Specification is hosted as ISO/IEC 25389 (as Jim mentions), so it's plausible that we could get a similar harms taxonomy specification there too.
DTSP might also be able to help create a standard to reference.
Warning
Content Warning: This proposal discuses and identifies harmful content, but does not attempt
to describe the harm posed in detail. This includes identifiers for child safety, sexual abuse,
self-harm, and other types of harm a user may encounter on the open internet.
Rendered
This proposal was split out from MSC4387.