Skip to content

add quality score script#69

Merged
AstroVasseL merged 9 commits into
mainfrom
feature/add-quality-score
Jun 17, 2026
Merged

add quality score script#69
AstroVasseL merged 9 commits into
mainfrom
feature/add-quality-score

Conversation

@AstroVasseL

Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread asmtransformers/scripts/quality_score.py Outdated
Comment thread asmtransformers/scripts/quality_score.py Outdated
Comment thread asmtransformers/scripts/quality_score.py Outdated
Comment on lines +13 to +20
jtp_out_of_range_token = tokenizer['model']['vocab']['JUMP_ADDR_EXCEEDED']
jtp_unknown = tokenizer['model']['vocab']['UNK_JUMP_ADDR']
minlength = max(jtp_out_of_range_token, jtp_unknown) + 1
token = np.asarray(dataset['input_ids']).ravel()
token_to_id = {t['content']: t['id'] for t in tokenizer['added_tokens']}
padding_token = token_to_id['[SEP]']
token = token[token != padding_token]
bincount = np.bincount(token, minlength=minlength)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot get to grips with numpy logic, but as discussed offline, I think i get it.

@AstroVasseL AstroVasseL merged commit ae6d3bb into main Jun 17, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants