Skip to content

how to achieve acceleration for the S2S Speech-to-Speech mode? #45

@eyree

Description

@eyree

Thank you very much for your work. Is it possible to achieve acceleration for the S2S mode, and if so, when is it expected to be supported?
I am currently attempting to develop an S2S version based on the already implemented S2T vllm acceleration, and I have encountered a major issue: In FunAudioChatDecoder, the crq_generate_forward (SRH head) needs to sample the next token based on speech_id, and this next token then becomes the input for SRH. However, vLLM handles the updating and merging of speech_id at the engine level. I would also appreciate guidance from experts on this problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions