Update docs with info on event interleaving#4
Conversation
|
Waiting for elixir-nx/bumblebee#454, adding a try block when the GenServer crashes would be a possible workaround |
|
|
||
| assert_end_of_stream(pipeline_pid, :sink, :input, 20_000) | ||
|
|
||
| [ |
There was a problem hiding this comment.
You are not using this list anywhere, are you? :D
There was a problem hiding this comment.
No, I only used it to group the assertions together visually
There was a problem hiding this comment.
But I suppose we should assert on receiving these events and I don't see it hapenning 🤔
| @impl true | ||
| def init(opts) do | ||
| GenServer.cast(self(), :serving_start) | ||
| Process.flag(:trap_exit, true) |
There was a problem hiding this comment.
Nx.Serving.run starts the serving in a separate process internally using spawn_link, and sends an EXIT signal to the ModelServer when it crashes. Without the trap, the ModelServer process gets killed before executing the code from the catch block.
I don't know if there's any machinations to avoid the trap, and I've had an LLM search through Bumblebee and Nx but it looks like the linking behaviour is not configurable
There was a problem hiding this comment.
How about spawning it under a dedicated Agent which would by monitored by the ModelServer?
|
|
||
| The transcripts are sent via the `:output` pad along with the audio buffers, as `Membrane.Whisper.TranscriptEvent` events. | ||
| A sequence of audio buffers is followed by an event containing the transcript for said sequence, e.g.: | ||
| `<audio frames 0s - 10s> <event with transciption of 0s-10s> <audio frames 10s-20s> <event with transcription of 10s-20s> <audio frames 20s-30s> ...` |
There was a problem hiding this comment.
It's not clear to me what happens when e.g. there is a chunk of audio (0s-10s) but the voice starts at 5s - will the TranscriptionEvent be sent with 0s-10s timestamp or 5s-10s?
There was a problem hiding this comment.
We don't have a mechanism for timestamping the events by default, hence the interleaving. The only way for TranscriptEvent.start_timestamp_seconds etc. to be non-nil is to enable timestamp generation in the Whisper serving via timestamps: :segments.
Unfortunately, enabling timestamps in Whisper seems to make the serving output transcripts correspond to audio of arbitrary length (the model doesn't respect the chunk duration anymore), violating the interleaving invariant. I'm working on something that might fix this by queueing buffers inside the filter and push them downstream based on the timestamps returned by Whisper when enabled, but that's outside the scope of this PR.
No description provided.