Fine tuning ditto on one avatar image

If we fine tune ditto on single avatar image and then generate real time talking head for that avatar from audio , will the resulting model be lighter and will it give faster response ? If not , what are the best ways to achieve low latency with ditto for known /one avatar image scenario?
Does ditto skip it's identity resemblance /identity extraction step if we train it directly on a single avatar image that will also be used at inference time ?
Can we pre compute the identity features (and store apperance features in memory) and reuse them to make process faster ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine tuning ditto on one avatar image #82

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fine tuning ditto on one avatar image #82

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions