Needlebench, llama2-7b-hf : I cannot reproduce the score same as paper! #1493
ainilian
started this conversation in
Community Task
Replies: 1 comment 3 replies
-
|
We use a chat model. Can you help confirm this, @DseidLi ? |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
dataset : NeedleBench 4K Single-Retrieval; model: hf_llama2_7b
score in the paper: Chinese=32.42, English = 93.84, Overall = 63.13
score I reproduced: Chinese=10.31, English = 4.51, Overall = 7.41
command
configs/eval_needlebench.py
result
20240903_095534 tabulate format ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ dataset llama-2-7b-hf ---------------------------------------------------------- --------------- NeedleBench-Overall-Score - dataset llama-2-7b-hf --------- NeedleBench-4K-Single-Needle-Retrieval --------- - dataset llama-2-7b-hf Single-Needle-Retrieval(S-RT) 7.41 Single-Needle-Retrieval-EN 4.51 Single-Needle-Retrieval-ZH 10.31 --------- NeedleBench-4K-Multi-Needle-Retrieval --------- - dataset llama-2-7b-hf Multi-Needle-Retrieval(M-RT) - dataset llama-2-7b-hf Multi-Needle-Retrieval-EN - dataset llama-2-7b-hf Multi-Needle-Retrieval-ZH - dataset llama-2-7b-hf --------- NeedleBench-4K-Multi-Needle-Reasoning --------- - dataset llama-2-7b-hf Multi-Needle-Reasoning(M-RS) - dataset llama-2-7b-hf Multi-Needle-Reasoning-EN - dataset llama-2-7b-hf Multi-Needle-Reasoning-ZH - dataset llama-2-7b-hf 2-Needle-EN-4K - dataset llama-2-7b-hf 2-Needle-ZH-4K - dataset llama-2-7b-hf 3-Needle-EN-4K - dataset llama-2-7b-hf 3-Needle-ZH-4K - dataset llama-2-7b-hf 4-Needle-EN-4K - dataset llama-2-7b-hf 4-Needle-ZH-4K - dataset llama-2-7b-hf 5-Needle-EN-4K - dataset llama-2-7b-hf 5-Needle-ZH-4K - dataset llama-2-7b-hf $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$Beta Was this translation helpful? Give feedback.
All reactions