how to achieve acceleration for the S2S Speech-to-Speech mode？

Thank you very much for your work. Is it possible to achieve acceleration for the **S2S mode**, and if so, when is it expected to be supported? 
I am currently attempting to develop an S2S version based on the already implemented S2T vllm acceleration, and I have encountered a major issue: In `FunAudioChatDecoder`, **the `crq_generate_forward` (SRH head) needs to sample the next token based on `speech_id`, and this next token then becomes the input for SRH. However, vLLM handles the updating and merging of `speech_id` at the engine level.** I would also appreciate guidance from experts on this problem. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to achieve acceleration for the S2S Speech-to-Speech mode？ #45

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

how to achieve acceleration for the S2S Speech-to-Speech mode？ #45

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions