Skip to content

Question about rhythm transfer #5

@mvoodarla

Description

@mvoodarla

I have the following use case: I would like train urhythmic to a target speakers voice and do any-to-one voice conversion that ensures the target timbre, but I would like to variably change the rhythm, pitch, speed, intonation, etc between predictions based on the intonation of a separate source audio. Is this possible?

I have successfully trained the vocoder which sounds really good, and have actually tried inferencing with varying rhythm-fine models but can't really hear it affecting things as much as the rhythm of the source clip. In your sample, the outputs feel like they have a rhythm more true to the target voices (though it's unclear whether you trained on just that sample somehow, or the speaker in general).

Any tips or insights would be appreciated :)

Currently one approach I'm taking is to retrain the rhythm model at inference time for both the source and the target speaker and have it overfit to that single sample.

I also want to commend how clean this codebase is. This is how all OSS ML repos should be.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions