Skip to content

[Question] How do I add a custom generative video transformer into TransformerLens? #869

@EmilRyd

Description

@EmilRyd

Question

I would like to do some mech interp on a generative video model (which will be a diffusion model but with temporal attention blocks). Note this model would not have any text element, and would just be a transformer predicting the next video frame (like Sora and other SOTA generative video models). It will be video-2-video (V2V). Afaik the current version of TransformerLens does not have support for a model like this - but I would like to get started on my research quite soon and hence would like to tailor TransformerLens to be able to handle a model like this. Two questions:

  1. Does it make sense for me to expand TransformerLens to this case? (ie, would it be better to just start from scratch and not use TransformerLens for this?)

  2. How would I go about doing this?

While question 1 may have a simple answer I realize that question 2 may not. I am happy to have a longer conversation about this + get working on it after that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    complexity-highVery complicated changes for people to address who are quite familiar with the code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions