About the dimension projection

The linear projection after the self attention:
`bs = self_attention.size(0)`
`self_attention = self_attention.view(bs, -1)`
`linear_proj = F.relu(self.linear_projection(self_attention))`

From the paper, they said "We project the self-attended neighbor encodings to a LARGER 4x2d dimensional space", so if you flatten out the last two dimensions of "self_attention" before the projection, how can you make sure neighbor < 4? 

In my opinion, we should not flatten the last two dimensions before projection, we do projection on the last dimension whose size is 2d, and 2d < 4x2d, so we are projecting it to a larger space.

Please point it out if I understand this wrong at some place, or you do this on purpose for some reason.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the dimension projection #33

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

About the dimension projection #33

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions