Skip to content

Data track schema metadata#1159

Open
ladvoc wants to merge 32 commits into
mainfrom
ladvoc/schema-metadata
Open

Data track schema metadata#1159
ladvoc wants to merge 32 commits into
mainfrom
ladvoc/schema-metadata

Conversation

@ladvoc

@ladvoc ladvoc commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Adds support for associating schema metadata with a published data track and storing/retrieving schema definitions.

Schema storage is built on-top of data blobs, a general purpose mechanism for storing large (in the order of KBs), arbitrary data blobs in a room:

Protocol additions for schema metadata:

@github-actions

Copy link
Copy Markdown
Contributor

Changeset

The following package versions will be affected by this PR:

Package Bump
livekit patch
livekit-datatrack patch
livekit-ffi patch

@ladvoc

ladvoc commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

For local testing:

@ladvoc ladvoc marked this pull request as ready for review June 18, 2026 18:35

@1egoman 1egoman left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally makes sense to me!

I think DataTrackSchemaId being a non primative value is key here to future interface evolution (like adding a generic type with a default which when set explictly can be used to assert the type of payloads later on).

const DATA_BLOB_REQUEST_TIMEOUT: Duration = Duration::from_secs(5);

/// Stores an arbitrary blob of data on the server, keyed by `key`.
async fn store_data_blob(&self, key: proto::DataBlobKey, contents: Bytes) -> EngineResult<()> {

@1egoman 1egoman Jun 22, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Should this be RoomResult<()>, not EngineResult<()>? There's a few other instances of this in the file as well.

/// nor validated against its [encoding](DataTrackSchemaId::encoding), so
/// the caller is responsible for ensuring it is well-formed.
///
pub async fn define_schema(&self, id: DataTrackSchemaId, definition: String) -> RoomResult<()> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: Are you confident that schemas will / should always be text? Or should definition: String be some sort of bytes object, like Vec<u8> or bytes::Bytes (or maybe impl Into<bytes::Bytes>)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, while the wire format (frame content) will often be binary, I can't think of any case where the schema definition language would be binary. @stephen-derosa, wdyt?

let (pub_room, _) = rooms.pop().unwrap();
let (_, mut sub_room_event_rx) = rooms.pop().unwrap();

let schema_id = DataTrackSchemaId::new("my_schema", DataTrackSchemaEncoding::JsonSchema);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is this DataTrackSchemaId::new(...) expected to be a typical pattern, maybe in the case that you know that a schema already exists on the SFU end? Or is the idea that in practice, a user would always call LocalParticipant::deine_schema and this is just being done this way for testing?

@stephen-derosa stephen-derosa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in general looks good, need to give it a deeper dive

optional SubscribeDataTrackError error = 1;
}

// MARK: - Schemas

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FMU, what is this // MARK: notation?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is recognized by most editors (see the minimap and navigator in Cursor/Vscode).

}

#[test]
fn test_frame_encoding_mapping() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_empty_frame_encoding ?

///
/// Called by a publisher to make a schema available to subscribers, who can
/// later look up its definition via [`get_schema`](Self::get_schema). Define a
/// schema before publishing any data track that references it, so that

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the behavior if a data track with a schema is published before the schema itself is published?

@ladvoc ladvoc Jun 23, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The track will be published normally and carry the schema ID. If you care about retrieving it on the subscriber end, it is recommended to store the schema before publishing the track to ensure it is available right away, but if you don't need the actual definition for you application, you can simply use schema ID as an identifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants