Skip to content

Standard Transformer #4

@Twinkle-ce

Description

@Twinkle-ce

作者你好,请问你们实验中Standard Transformer的结果是如何实现的啊?我按照文章中的结构图,修改第一层attention,然后修改对应的vkq,但是结果f1只有60多。希望作者能提供一下具体细节,谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions