Question about rhythm transfer

I have the following use case: I would like train urhythmic to a target speakers voice and do any-to-one voice conversion that ensures the target timbre, but I would like to variably change the rhythm, pitch, speed, intonation, etc between predictions based on the intonation of a separate source audio. Is this possible?

I have successfully trained the vocoder which sounds really good, and have actually tried inferencing with varying rhythm-fine models but can't really hear it affecting things as much as the rhythm of the source clip. In your sample, the outputs feel like they have a rhythm more true to the target voices (though it's unclear whether you trained on just that sample somehow, or the speaker in general).

Any tips or insights would be appreciated :)

Currently one approach I'm taking is to retrain the rhythm model at inference time for both the source and the target speaker and have it overfit to that single sample.

I also want to commend how clean this codebase is. This is how all OSS ML repos should be.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about rhythm transfer #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about rhythm transfer #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions