Describe the bug
Single GPU evaluation of non-quantized and quantized granite-3.3-8b-instruct returns extremely high perplexity when performed via eval_llm_1GPU function, which processes evaluation block by block to bypass GPU memory constraints. Perplexity is as-expected when running with the standard Evaluator (from fms_mo.utils.eval_utils), which performs inference on the full model.
Expected behavior
Baseline perplexity is expected.
Additional context
Part of the issue resides in block identification within get_blocks, which appears to assume that granite model is using the outdated BigCode architecture, instead of the more recent Granite. However, after fixing this problem, perplexity did not recover.
cc: @bayo-ibm @IqbalSaraf @chichun-charlie-liu
Describe the bug
Single GPU evaluation of non-quantized and quantized granite-3.3-8b-instruct returns extremely high perplexity when performed via
eval_llm_1GPUfunction, which processes evaluation block by block to bypass GPU memory constraints. Perplexity is as-expected when running with the standardEvaluator(fromfms_mo.utils.eval_utils), which performs inference on the full model.Expected behavior
Baseline perplexity is expected.
Additional context
Part of the issue resides in block identification within
get_blocks, which appears to assume that granite model is using the outdated BigCode architecture, instead of the more recent Granite. However, after fixing this problem, perplexity did not recover.cc: @bayo-ibm @IqbalSaraf @chichun-charlie-liu