MSC4438: Message bookmarks via account data#4438
Conversation
30a3a09 to
037fa44
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new MSC proposal describing an interoperable client-side convention for bookmarking Matrix room events using global account data, with a per-user index event for ordering and one account-data event per bookmark for metadata.
Changes:
- Defines account data event types and schemas for a bookmark index and per-bookmark items.
- Specifies client workflows for adding, removing (tombstoning), listing, and syncing bookmarks across devices.
- Documents privacy/security considerations and an example bookmark ID generation approach.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Implementation requirements:
- Client (sending)
- Client (receiving/rendering)
turt2live
left a comment
There was a problem hiding this comment.
Early MSC checklist requirements review - I haven't meaningfully looked at the MSC content.
| ### Bookmark index event | ||
|
|
||
| The bookmark index event maintains the bookmark ID sequence for all active bookmarks along with | ||
| versioning metadata. |
There was a problem hiding this comment.
What's the benefit of this? Account data is generally synced in its entirety
There was a problem hiding this comment.
It lists which bookmarks are active (not deleted) and hints the ordering.
| ``` | ||
|
|
||
| Clients must ignore malformed bookmark item events. Clients must also ignore bookmark item events | ||
| where `deleted` is `true`. |
There was a problem hiding this comment.
What's the benefit of a separate deleted field over simply clearing the content?
There was a problem hiding this comment.
Finding all trashed bookmarks will require a linear scan of the entire bookmark set this way. If you really need a wastebin, maybe it would be better to just collect the deleted bookmarks in a separate m.bookmarks.wastebin.index?
There was a problem hiding this comment.
True. Undeletion will require some one-time effort.
This is required when someone actually decides to implement undeletion feature. On the contrary, maintaining the wastebin will require effort on every client, whether it implements undeletion or not.
Conversely:
- What is the benefit of simply clearing the content?
|
|
||
| The presence of `room_id`, `event_id`, and `uri` duplicates some information by design. This lets | ||
| clients use the explicit room and event identifiers directly for display, grouping, partitioning, | ||
| or lookup purposes without having to parse URI components out of `uri`. |
There was a problem hiding this comment.
Why is uri there at all? It doesn't seem like there's any benefit to that over room/event ID. If routing information is expected to be needed, it'd be better to just add a via field and remove uri
There was a problem hiding this comment.
The idea is to bookmark the uri.
The room_id, event_id are additional data, for ease of client implementation - like grouping by room.
There was a problem hiding this comment.
Could the URI not be built on demand and locally from the other properties in the bookmark structure?
There was a problem hiding this comment.
I suppose it could. But using the uri as the main concern will allow natural updates in the future, when the uri schema gets expanded.
There was a problem hiding this comment.
I do not feel strongly about keeping room_id and event_id fields though.
I could as well generate it by parsing uri when inserting the bookmark into my client bookmark storage.
2138484 to
b518b9a
Compare
|
|
||
| The `revision` field allows clients to detect concurrent modifications from other devices. When a | ||
| client receives an updated index via `/sync` with a `revision` higher than the locally known | ||
| value, it should refresh its bookmark list from the server. |
There was a problem hiding this comment.
Would this not be a relatively rare situation given that the index should only be changed based on user interaction and people tend to not use two devices in parallel?
There was a problem hiding this comment.
I am a counter-example. I usually run two devices: mobile phone and desktop client.
I sometimes run a third client in a browser tab.
There was a problem hiding this comment.
Right but you probably don't bookmark events in two or more clients at the same time?
There was a problem hiding this comment.
I can't promise I won't do it. 😅
| * When creating a bookmark, if a client already knows an existing bookmark item for the same | ||
| `(room_id, event_id)`, it should reuse that item's `bookmark_id` instead of generating a new | ||
| one. | ||
| * Clients should make reasonable effort to deduplicate bookmarks for the same `(room_id, event_id)`. |
There was a problem hiding this comment.
What is the use case for bookmarking the same event twice? The only thing that will differ are bookmark_id, bookmarked_ts and the cached content data.
There was a problem hiding this comment.
There is no proper use case. It is a bug, which should be prevented with reasonable effort.
There was a problem hiding this comment.
In that case it sounds like we should just prescribe a scheme for generating bookmark IDs that prevents that?
There was a problem hiding this comment.
Even if we do, the client will need to do the effort to check for duplicates, in order to prevent clashing with identifiers generated with buggy other clients, buggy older version of the same client, older version of the MSC.
If so, we might as well leave it free to the client implementation discrepancy.
| The ordering of `bookmark_ids` should be treated as the default display ordering hint. Clients may | ||
| sort, group, or partition bookmarks differently when presenting bookmarks to the user. |
There was a problem hiding this comment.
Given that bookmarks in browsers use a file / folder schema, I wonder if we should add support for this from the start?
There was a problem hiding this comment.
If someone is willing to build more features based on this MSC, I see no blockers in the current implementation. Implementations of this MSC should just ignore unknown fields, that might be added by later protocols.
| ### Bookmark ID | ||
|
|
||
| The `bookmark_id` identifies a bookmark item and forms the suffix of the corresponding account | ||
| data event type. This MSC does not require any particular bookmark ID generation method. |
There was a problem hiding this comment.
I like this proposal as it can be used as message level read indicators (not to be confused with read receipts) and expanded more.
I'm not super versed on the matrix spec stuff but not having a standardized bookmark ID generation method seems counterintuitive to being a spec to me. If it's not the same across all clients, then you can't really use between different clients unless they share the same bookmark ID generation.
There was a problem hiding this comment.
The bookmark_id is just an opaque string to be used as an item in the index.
Standardizing on the id generation algorithm could result in an expectation, that there is some semantic encoded in the identifier, while it should not be.
Clients would need to be resilient to out-of-spec identifiers anyway, due to bugs in implementation in the wild. We can standardize on such behavior then.
Following list is perfectly fine, and should be handled by any client without additional effort:
"bookmark_ids": [
"bmk_1a2b3c4d",
"bmk_5e6f7788",
"chr_shahphie3K",
"9836b821-da3a-4599-be1f-76e390ae3f31",
"54c30437-8775-4f44-bc60-8865702594ea",
"chr_Hiemu5ahKo",
"69JH5EWWFKTDQCT5N5D9LAPEMQNV7HMNUA7RSFJX2VHEelmt"
]
|
This may also benefit from the work towards encrypting account data introduced in #4441. The primitives for encrypting account data could potentially be factored out into a separate proposal? |
b518b9a to
98ee0d8
Compare
98ee0d8 to
8f1810b
Compare
|
I've removed the details of reference implementation |
Rendered
Implementations:
Signed-off-by: Tomasz Sterna tomasz@sterna.link