You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(deps): upgrade transformers to 5.x and sentence-transformers to 5.2+ (#295)
The 4.57.x mistral-regex codepath called `huggingface_hub.model_info()` on
every tokenizer load with vocab >100k (e.g. `intfloat/multilingual-e5-*`),
hammering HF's rate limit in CI and in production. transformers 5.0+ caches
that probe per-process and respects `local_files_only`/`HF_HUB_OFFLINE`.
The bump is necessarily a coordinated two-package migration: ST 5.2.0 is
the first release that lifts the `transformers<5.0.0` cap. Resolved
versions: transformers 5.12.1, sentence-transformers 5.6.0.
Adjusts the v5.x surfaces that actually broke:
- ranker.py: `cross_encoder.model.classifier` → `cross_encoder[0].auto_model.classifier`
(ST 5 restructured CrossEncoder into a nn.Sequential of modules).
- ranker.py: CrossEncoder.predict() renamed `activation_fct` → `activation_fn`.
- ranker.py: `cross_encoder.model.cpu()` → `cross_encoder.cpu()` (the wrapper
is itself an nn.Module now, no underlying `.model` attribute).
- embedder/sentence_transformers.py: import `losses`/`training_args` from
`sentence_transformers.sentence_transformer` (top-level path deprecated).
- embedder/sentence_transformers.py: `warmup_ratio=` → `warmup_steps=` (v5
TrainingArguments accepts a float <1.0 there as a ratio).
- test_sentence_transformers_backend.py: `get_sentence_embedding_dimension()`
→ `get_embedding_dimension()`.
Removes the `_disable_transformers_mistral_regex_patch` workaround from
tests/conftest.py — the underlying bug is fixed in v5.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
def_noop_patch_mistral_regex( # type: ignore[no-untyped-def] # reason: monkey-patched into transformers internal classmethod; transformers is in ignore_missing_imports so signature types are unavailable
0 commit comments