questions about NIAH

Congrats for the insightful paper!

![image](https://github.com/user-attachments/assets/ef216da5-36b5-4362-8904-d0bdc4313d51)

I noticed a few points in the figure in the appendix that I find a bit confusing, and here are two questions:

1. Since the 'Training Long Language Model' step uses a context length of only 224k, why does the model still show high accuracy even when the context length reaches 512k?

2. I observed that when the distractor is set to 5, the distribution of the NIAH results appears unusual. It seems that the context length of 224k performs better than the context length of 64k, which is quite different from what is typically seen in NIAH results for other models.

Looking forward to your insights on these points
Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

questions about NIAH #35

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

questions about NIAH #35

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions