Skip to content

feat(deps): upgrade transformers to 5.x and sentence-transformers to 5.2+#341

Draft
voorhs wants to merge 7 commits into
devfrom
migrate-295-transformers-v5
Draft

feat(deps): upgrade transformers to 5.x and sentence-transformers to 5.2+#341
voorhs wants to merge 7 commits into
devfrom
migrate-295-transformers-v5

Conversation

@voorhs

@voorhs voorhs commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator

Closes #295.

Why

transformers 4.57.x calls huggingface_hub.model_info() on every
tokenizer load for any model with vocab_size > 100000 (e.g. our default
intfloat/multilingual-e5-* family). That probe is uncacheable, fires
through the SHA pin, and 429s under CI matrix / parallel load — and
because the tests-only conftest monkey-patch isn't in production code,
real users on the classic/zero-shot-encoder presets hit it too.

transformers 5.0+ caches the probe per-process and respects
local_files_only / HF_HUB_OFFLINE (HF #45444). Per transformers
release notes the v5 fix is intentionally not backported to 4.x —
upgrading is the only path.

Scope of the bump

This is necessarily a coordinated two-package migration. sentence-transformers
3.x pins transformers<5.0.0, and the cap persists through ST 5.1.x;
ST 5.2.0 is the first release that lifts it to transformers<6.0.0.
Resolved versions: transformers==5.12.1, sentence-transformers==5.6.0.

Changes

  • pyproject.toml: bump both extras.
  • src/autointent/_wrappers/ranker.py — ST 5 restructured CrossEncoder
    into a nn.Sequential of modules:
    • cross_encoder.model.classifiercross_encoder[0].auto_model.classifier
    • predict(activation_fct=...)predict(activation_fn=...)
    • cross_encoder.model.cpu()cross_encoder.cpu() (wrapper is itself a nn.Module).
  • src/autointent/_wrappers/embedder/sentence_transformers.py:
    • Import losses / training_args from sentence_transformers.sentence_transformer
      (top-level submodule path is deprecated in 5.x).
    • warmup_ratio=warmup_steps= (v5 TrainingArguments accepts a
      float < 1.0 there as a fraction of total training steps).
  • tests/conftest.py: remove _disable_transformers_mistral_regex_patch
    (the underlying bug is fixed in v5).
  • tests/embedder/test_sentence_transformers_backend.py:
    get_sentence_embedding_dimension()get_embedding_dimension().

Test plan

Verified locally on Python 3.14:

  • pytest tests/embedder — 83 passed
  • pytest tests/modules/scoring/{test_dnnc,test_description_cross,test_rerank_scorer} — 7 passed (Ranker / CrossEncoder paths)
  • pytest tests/modules/test_dumper.py — 8 passed (HF model save/load)
  • pytest --collect-only — 611 tests collect cleanly
  • ruff check on changed files — clean
  • Full CI matrix — pending (intentionally pushed to let CI exercise the long suites)

🤖 Generated with Claude Code

voorhs and others added 7 commits June 27, 2026 22:42
…5.2+ (#295)

The 4.57.x mistral-regex codepath called `huggingface_hub.model_info()` on
every tokenizer load with vocab >100k (e.g. `intfloat/multilingual-e5-*`),
hammering HF's rate limit in CI and in production. transformers 5.0+ caches
that probe per-process and respects `local_files_only`/`HF_HUB_OFFLINE`.

The bump is necessarily a coordinated two-package migration: ST 5.2.0 is
the first release that lifts the `transformers<5.0.0` cap. Resolved
versions: transformers 5.12.1, sentence-transformers 5.6.0.

Adjusts the v5.x surfaces that actually broke:

- ranker.py: `cross_encoder.model.classifier` → `cross_encoder[0].auto_model.classifier`
  (ST 5 restructured CrossEncoder into a nn.Sequential of modules).
- ranker.py: CrossEncoder.predict() renamed `activation_fct` → `activation_fn`.
- ranker.py: `cross_encoder.model.cpu()` → `cross_encoder.cpu()` (the wrapper
  is itself an nn.Module now, no underlying `.model` attribute).
- embedder/sentence_transformers.py: import `losses`/`training_args` from
  `sentence_transformers.sentence_transformer` (top-level path deprecated).
- embedder/sentence_transformers.py: `warmup_ratio=` → `warmup_steps=` (v5
  TrainingArguments accepts a float <1.0 there as a ratio).
- test_sentence_transformers_backend.py: `get_sentence_embedding_dimension()`
  → `get_embedding_dimension()`.

Removes the `_disable_transformers_mistral_regex_patch` workaround from
tests/conftest.py — the underlying bug is fixed in v5.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Bump sentence-transformers lower bound 5.2.0 → 5.4.0. The new ranker /
  embedder paths (cross_encoder[0] subscript, sentence_transformers.
  sentence_transformer subpackage, get_embedding_dimension) all landed
  in 5.4.0; the previous floor would have ModuleNotFoundError'd /
  AttributeError'd anyone resolving 5.2.x–5.3.x.
- Constrain EmbedderFineTuningConfig.warmup_ratio to (0, 1). v5
  TrainingArguments interprets warmup_steps>=1 as a raw step count
  and <1 as a fraction, so a stray warmup_ratio=1.0 would silently
  produce one warmup step instead of full-training warmup.
- Refresh tests/test_deps.py synthetic metadata fixtures to v5
  version strings so the resolver tests exercise the version range
  we ship, not the v4 range we just left behind.
- Trim the v4→v5 narrating comments down to the WHY of the current
  code; per-line migration history belongs in the commit log.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Reviewer flagged that `gt=0` rejects the legal `warmup_ratio=0.0` config
(disable warmup). Relax to `ge=0`; `lt=1` is kept because that's the
v5 boundary where warmup_steps flips from ratio to raw step count.

Regenerate the published JSON schema so it reflects the constraint —
otherwise YAML authoring against the schema would pass schema
validation and fail at runtime.

Pushed back on the reviewer's claim that `warmup_steps=0.1` runs zero
warmup: transformers v5 typed `warmup_steps: float` and `get_warmup_steps`
branches on `>= 1`, not `> 0` — `0.1` takes the `math.ceil(N * 0.1)`
fraction branch (training_args.py:2089 in v5.12.1).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- _bert.py: coerce label2id/id2label keys to str. huggingface_hub 1.x
  StrictDataclassFieldValidationError rejects int-keyed label2id; the
  v5 AutoModelForSequenceClassification.from_pretrained pipeline now
  routes through that validator, so the previous {int: int} mapping
  raised on every BertScorer.fit (and cascaded into a fallback
  hf_hub_download call that the test guard caught as 'unpinned').
- ranker.py: cast cross_encoder[0] to Any for auto_model.classifier
  access (nn.Sequential.__getitem__ is typed Tensor | Module on v5);
  add arg-type ignores on CrossEncoder.predict(list[tuple[str,str]])
  calls — the v5 stub demands the much wider Sequence type but the
  list-of-pairs form is the documented call shape.
- Drop type: ignore comments mypy now reports as unused
  (AutoTokenizer.from_pretrained gained a typed stub in transformers
  v5; max_length matches TokenizerConfig.max_length cleanly).
- conftest.py: SentenceTransformer's constructor is typed Any on v5,
  so add no-any-return ignore at the fixture boundary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
A future refactor sees `{str(i): i}` as a no-op coercion and "simplifies"
back to `{i: i}`; mypy passes, then BertScorer.fit raises
StrictDataclassFieldValidationError at runtime. Comment makes the WHY
explicit at the call site, matching the WHY-only comment policy from
14f9576.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When PEFT is installed, transformers v5 calls find_adapter_config_file
on every AutoModelForSequenceClassification.from_pretrained. The
auto_factory only propagates `_commit_hash` (used for the cache
lookup) but NOT the outer `revision` to the fall-through
hf_hub_download. On a cold cache — i.e. our CI warm-cache job, which
populates model files but no negative marker for adapter_config.json —
that probe fires `hf_hub_download(repo_id, adapter_config.json,
revision=None)` and our test guard rightly flagged it as unpinned.

Pass `adapter_kwargs={"revision": revision}` so the adapter probe
inherits the pin. The first run still writes a `.no_exist` marker, but
all subsequent runs (and CI's pinned-only contract) stay clean.

Reproduces with: rm -rf ~/.cache/huggingface/hub/models--prajjwal1--bert-tiny/.no_exist
then pytest tests/pipeline/test_inference.py::test_inference_from_config[multiclass].

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
PEFT's get_peft_model_state_dict (save_and_load.py:380-384) runs an
embedding-resize sanity check on every Trainer.save_checkpoint by
calling model.config.__class__.from_pretrained(base_model_name_or_path)
with no revision. transformers fills in revision='main' as the default,
so the call hits hf_hub_download('prajjwal1/bert-tiny',
'config.json', revision='main') — unpinned, which our CI guard
correctly flags. On a cold cache (CI), this trips on every
LoRA/PTuning trial that runs through Trainer.

Clear base_model_name_or_path on the peft_config after get_peft_model
so the vocab check short-circuits at `if model_id is not None`. Our
dumper (PeftModelDumper / HFModelDumper) saves the base model
separately and the load path passes it explicitly, so the adapter
config doesn't need to remember it.

Reproduces with:
  rm -rf ~/.cache/huggingface/hub/models--prajjwal1--bert-tiny/.no_exist
  pytest tests/pipeline/test_inference.py::test_inference_from_config[multiclass]

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@voorhs voorhs added the full-ci Run test suite on full OS and Python matrix label Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full-ci Run test suite on full OS and Python matrix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Upgrade transformers to 5.x (drops conftest mistral monkey-patch)

1 participant