Skip to content

Missing num_tokens_seen.txt in AWS exemplar dataset #12

@BV003

Description

@BV003

Hi,

I was trying to download and use the AWS exemplar dataset, but I could not find num_tokens_seen.txt in the layer directories.

In the code, the original lines are:

num_tokens_seen = 0
for split in splits:
     layer_dir = self.get_layer_dir(layer, split)
     with open(os.path.join(layer_dir, "num_tokens_seen.txt"), "r") as f:
         num_tokens_seen += int(f.read())

Could you please confirm whether this file is included in the AWS dataset? If not, is there a recommended way to replace it for quantile calculations?

Thanks!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions