Skip to content

large models evaluation on single GPU fails #155

@andrea-fasoli

Description

@andrea-fasoli

Describe the bug

Single GPU evaluation of non-quantized and quantized granite-3.3-8b-instruct returns extremely high perplexity when performed via eval_llm_1GPU function, which processes evaluation block by block to bypass GPU memory constraints. Perplexity is as-expected when running with the standard Evaluator (from fms_mo.utils.eval_utils), which performs inference on the full model.

Expected behavior

Baseline perplexity is expected.

Additional context

Part of the issue resides in block identification within get_blocks, which appears to assume that granite model is using the outdated BigCode architecture, instead of the more recent Granite. However, after fixing this problem, perplexity did not recover.

cc: @bayo-ibm @IqbalSaraf @chichun-charlie-liu

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions