Skip to content

Error when loading the LLM checkpoint shards #11

@ShengenWu

Description

@ShengenWu

When trying to load the pretrained model using eval scripts, we encountered an error during checkpoint loading. The process fails at 67% all the time, and the traceback suggests an issue with loading the final one of the shard files. We have tried re-download the checkpoint file on another server but the error is still exist.

Error message

Loading checkpoint shards:  67%|████████████████████████████████████████████████████████████████████████████████ | 2/3 [00:10<00:05,  5.13s/it]
Traceback (most recent call last):
  File "xxxxx/GeoX/eval/inference.py", line 101, in <module>
    main(args)
  File "xxxxx/GeoX/eval/inference.py", line 26, in main
    tokenizer, model, image_processor, context_len = load_pretrained_model(model_path, device_map="cuda")
  File "xxxxx/GeoX/utils/developer.py", line 32, in load_pretrained_model
    model = GeoXLlamaForCausalLM.from_pretrained(
  File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3706, in from_pretrained
    ) = cls._load_pretrained_model(
  File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4091, in _load_pretrained_model
    state_dict = load_state_dict(shard_file)
  File "xxxxx/miniconda3/envs/geox/lib/python3.10/site-packages/transformers/modeling_utils.py", line 505, in load_state_dict
    if metadata.get("format") not in ["pt", "tf", "flax"]:
AttributeError: 'NoneType' object has no attribute 'get'

Steps to reproduce the behavior:

  1. Using the command bash scripts/eval_geoqa_top1.sh (We have tried geoqa/geometry3k/pgps9k eval scripts)
  2. Observe the loading progress until ~67% for every checkpoint files.
  3. The program crashes with the above traceback.

Environment:
As described in readme.md.

Additional context
It seems the metadata for one of the checkpoint shards might be None. Possibly a corrupted or incomplete file?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions