I am using the command CUDA_VISIBLE_DEVICES=0,1 python3 SpecForge/benchmarks/bench_eagle3.py --model-path Qwen/Qwen3-8B --speculative-draft-model-path path/to/model --port 30000 --trust-remote-code --mem-fraction-static 0.8 --tp-size 1 --attention-backend fa3 --config-list 1,6,10,32 --benchmark-list mtbench --dtype bfloat16 to test 2 methods, however the first time I tested, baselineA results in TPS 106, baselineB results in TPS 100, the second time, baselineA results in TPS93, baseline B results in TPS 107, the third and later time I tested, both baselines are TPS 93......I don't think the randomness in the experiment is the cause, and the GPU and parameters for each experiment is all the same, does anyone faced the same issue or know about possible reasons?
I am using the command
CUDA_VISIBLE_DEVICES=0,1 python3 SpecForge/benchmarks/bench_eagle3.py --model-path Qwen/Qwen3-8B --speculative-draft-model-path path/to/model --port 30000 --trust-remote-code --mem-fraction-static 0.8 --tp-size 1 --attention-backend fa3 --config-list 1,6,10,32 --benchmark-list mtbench --dtype bfloat16to test 2 methods, however the first time I tested, baselineA results in TPS 106, baselineB results in TPS 100, the second time, baselineA results in TPS93, baseline B results in TPS 107, the third and later time I tested, both baselines are TPS 93......I don't think the randomness in the experiment is the cause, and the GPU and parameters for each experiment is all the same, does anyone faced the same issue or know about possible reasons?