Skip to content

Commit 02bc46d

Browse files
svc-bionemoclaude
andcommitted
fix(vllm_inference): use local tokenizer for nvidia Hub reference model
The nvidia/esm2_t6_8M_UR50D Hub tokenizer_config.json references TokenizersBackend which was removed in transformers 5.x, causing AutoTokenizer.from_pretrained() to raise ValueError. Load the reference model's tokenizer from the local esm_fast_tokenizer directory (PreTrainedTokenizerFast) instead of from the Hub config. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Peter St. John <pstjohn@nvidia.com> Signed-off-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com>
1 parent 853a54b commit 02bc46d

1 file changed

Lines changed: 7 additions & 3 deletions

File tree

  • bionemo-recipes/recipes/vllm_inference/esm2/tests

bionemo-recipes/recipes/vllm_inference/esm2/tests/test_vllm.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,12 +61,14 @@ def _last_token_l2(hidden_state: torch.Tensor) -> np.ndarray:
6161
return vec
6262

6363

64-
def _hf_embed(model_id: str, sequences: list[str], dtype=torch.float32) -> np.ndarray:
64+
def _hf_embed(model_id: str, sequences: list[str], dtype=torch.float32, tokenizer_id: str | None = None) -> np.ndarray:
6565
"""Run HuggingFace inference and return last-token L2-normalised embeddings."""
6666
torch.manual_seed(42)
6767
torch.cuda.manual_seed_all(42)
6868
model = AutoModel.from_pretrained(model_id, trust_remote_code=True).to("cuda", dtype=dtype).eval()
69-
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
69+
tokenizer = AutoTokenizer.from_pretrained(
70+
tokenizer_id if tokenizer_id is not None else model_id, trust_remote_code=True
71+
)
7072

7173
vecs = []
7274
with torch.no_grad():
@@ -133,7 +135,9 @@ def hf_exported_embeddings(exported_checkpoint):
133135
@pytest.fixture(scope="session")
134136
def hf_reference_embeddings():
135137
"""Embeddings from HuggingFace on the nvidia Hub model (ground truth)."""
136-
return _hf_embed(REFERENCE_MODEL_ID, SEQUENCES)
138+
# The nvidia Hub tokenizer_config.json references TokenizersBackend which was removed in
139+
# transformers 5.x. Use the local PreTrainedTokenizerFast implementation instead.
140+
return _hf_embed(REFERENCE_MODEL_ID, SEQUENCES, tokenizer_id=str(ESM2_MODEL_DIR / "esm_fast_tokenizer"))
137141

138142

139143
# ---- Tests ----

0 commit comments

Comments
 (0)