Skip to content

models with safetensors format predict not correct #459

Description

@ziyouchutuwenwu

i download models manually from https://huggingface.co/google-bert/bert-base-uncased/tree/main

i saved two kinds of models

pytorch_model.bin
model.safetensors

here is my dir

$ tree models
models
└── bert-dl
    ├── pytorch
    │   ├── config.json
    │   ├── pytorch_model.bin
    │   ├── tokenizer_config.json
    │   └── tokenizer.json
    └── safetensors
        ├── config.json
        ├── model.safetensors
        ├── tokenizer_config.json
        └── tokenizer.json

4 directories, 8 files

deps

{:bumblebee, "~> 0.7.0"},
{:exla, ">= 0.0.0"}

code

defmodule Demo do
  def demo1 do
    {:ok, model_info} = Bumblebee.load_model({:local, "models/bert-dl/pytorch"})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:local, "models/bert-dl/pytorch"})

    serving = Bumblebee.Text.fill_mask(model_info, tokenizer)
    Nx.Serving.run(serving, "The capital of [MASK] is Paris.")
  end

  def demo2 do
    {:ok, model_info} = Bumblebee.load_model({:local, "models/bert-dl/safetensors"})
    {:ok, tokenizer} = Bumblebee.load_tokenizer({:local, "models/bert-dl/safetensors"})

    serving = Bumblebee.Text.fill_mask(model_info, tokenizer)
    Nx.Serving.run(serving, "The capital of [MASK] is Paris.")
  end
end

for demo2, result incorrect, all scores are 0.00xxx

iex(1)> Demo.demo2

16:57:29.896 [debug] the following parameters were missing:

  * language_modeling_head.output.kernel


16:57:29.900 [debug] the following PyTorch parameters were unused:

  * bert.pooler.dense.bias
  * bert.pooler.dense.weight
  * cls.seq_relationship.bias
  * cls.seq_relationship.weight

%{
  predictions: [
    %{token: "alexander", score: 0.0018759453669190407},
    %{token: "##erving", score: 0.00136240862775594},
    %{token: "brazil", score: 0.001361749367788434},
    %{token: "muster", score: 0.0013256366364657879},
    %{token: ".", score: 0.0012544215423986316}
  ]
}

demo1 has correct score 0.92xxx

iex(1)> Demo.demo1
%{
  predictions: [
    %{token: "france", score: 0.9279839992523193},
    %{token: "brittany", score: 0.008412548340857029},
    %{token: "algeria", score: 0.007433690130710602},
    %{token: "department", score: 0.004957552067935467},
    %{token: "reunion", score: 0.004369732923805714}
  ]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions