I encountered an issue where there was no class explicitly defined as CrossAttention in the code, which resulted in the self-attention mechanism not being called. As a result, the attention-related loss remained at 0. How should this problem be solved?
I encountered an issue where there was no class explicitly defined as CrossAttention in the code, which resulted in the self-attention mechanism not being called. As a result, the attention-related loss remained at 0. How should this problem be solved?