Skip to content

Commit ac159e5

Browse files
authored
Merge branch 'main' into fix-deepspeed-ep-init
2 parents ae548bf + bca7eee commit ac159e5

117 files changed

Lines changed: 4214 additions & 1027 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docker/transformers-pytorch-amd-gpu/Dockerfile

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,25 @@ LABEL maintainer="Hugging Face"
44
ARG DEBIAN_FRONTEND=noninteractive
55

66
RUN apt update && \
7-
apt install -y --no-install-recommends git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-dev python3-pip python3-dev ffmpeg git-lfs && \
7+
apt install -y --no-install-recommends git libsndfile1-dev tesseract-ocr espeak-ng python3 python3-dev python3-pip python3-dev ffmpeg git-lfs libjpeg-turbo8-dev libpng-dev zlib1g-dev && \
88
apt clean && \
99
rm -rf /var/lib/apt/lists/*
1010

1111
RUN git lfs install
1212

1313
RUN python3 -m pip install --no-cache-dir --upgrade pip numpy importlib-metadata setuptools wheel ninja pytesseract "itsdangerous<2.1.0"
14+
15+
# Rebuild torchvision so decode_image has libjpeg and ROCm image ops stay on GPU.
16+
RUN python3 -m pip install --no-cache-dir "setuptools<81" pybind11
17+
RUN TV_VERSION=$(python3 -c "import torchvision; print(torchvision.__version__.split('+')[0])") && \
18+
python3 -m pip uninstall -y torchvision && \
19+
git clone --depth 1 --branch "v${TV_VERSION}" https://github.com/pytorch/vision.git /tmp/vision && \
20+
cd /tmp/vision && \
21+
sed -i -E 's|list\(CSRS_DIR\.glob\("([^"]+\.cpp)"\)\)|[p for p in CSRS_DIR.glob("\1") if not p.name.endswith("_hip.cpp")]|g' setup.py && \
22+
FORCE_CUDA=1 TORCHVISION_USE_FFMPEG=0 TORCHVISION_USE_VIDEO_CODEC=0 \
23+
python3 -m pip install --no-cache-dir --no-build-isolation -v . && \
24+
cd / && rm -rf /tmp/vision
25+
1426
RUN python3 -m pip install --no-cache-dir --no-build-isolation git+https://github.com/facebookresearch/detectron2.git
1527

1628
ARG REF=main

docs/source/en/_toctree.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -661,6 +661,8 @@
661661
title: HunYuanDenseV1
662662
- local: model_doc/hunyuan_v1_moe
663663
title: HunYuanMoEV1
664+
- local: model_doc/hy_v3
665+
title: HYV3
664666
- local: model_doc/ibert
665667
title: I-BERT
666668
- local: model_doc/jais2
@@ -1365,6 +1367,8 @@
13651367
title: SigLIP
13661368
- local: model_doc/siglip2
13671369
title: SigLIP2
1370+
- local: model_doc/slanet
1371+
title: SLANet
13681372
- local: model_doc/slanext
13691373
title: SLANeXt
13701374
- local: model_doc/smollm3

docs/source/en/model_doc/glm_moe_dsa.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ limitations under the License.
1616
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.
1717
1818
-->
19-
*This model was released on {release_date} and added to Hugging Face Transformers on 2026-02-08.*
19+
*This model was released on 2026-02-17 and added to Hugging Face Transformers on 2026-02-09.*
2020

2121
<div style="float: right;">
2222
<div class="flex flex-wrap space-x-1">

docs/source/en/model_doc/hy_v3.md

Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
<!--Copyright 2026 THL A29 Limited, a Tencent company and The HuggingFace Inc. team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
*This model was released on {release_date} and added to Hugging Face Transformers on 2026-04-22.*
17+
18+
# Hy3-preview
19+
20+
## Overview
21+
22+
Hy3-preview is a large-scale Mixture-of-Experts (MoE) language model developed by the Tencent HunYuan team. It features a dense-MoE hybrid architecture with 192 routed experts and 1 always-active shared expert per MoE layer, achieving strong performance with efficient inference via sparse expert activation.
23+
24+
Key architectural features:
25+
26+
- **Dense-MoE hybrid**: The first layer uses a dense FFN; all subsequent layers use MoE with top-k routing (default k=8).
27+
- **Shared experts**: Each MoE layer includes 1 shared expert that processes all tokens alongside the routed experts.
28+
- **Sigmoid routing with expert-bias correction**: Tokens are routed via sigmoid scoring (not softmax) with a learned per-expert bias for load balancing.
29+
- **QK-Norm**: Per-head RMSNorm applied to query and key projections before attention for improved training stability.
30+
31+
## Usage tips
32+
33+
- Load with `AutoModelForCausalLM`. The model requires multiple GPUs due to its size.
34+
- Set `output_router_logits=True` in the config or forward call to collect per-layer MoE router logits. Note that this model does not compute an auxiliary load-balancing loss; `aux_loss` is always `None`.
35+
- The model supports `gradient_checkpointing` to reduce memory during fine-tuning.
36+
37+
```python
38+
from transformers import AutoTokenizer, AutoModelForCausalLM
39+
40+
model_id = "tencent/Hy3-preview"
41+
tokenizer = AutoTokenizer.from_pretrained(model_id)
42+
model = AutoModelForCausalLM.from_pretrained(
43+
model_id,
44+
device_map="auto",
45+
)
46+
47+
inputs = tokenizer("The future of artificial intelligence is", return_tensors="pt").to(model.device)
48+
outputs = model.generate(**inputs, max_new_tokens=64)
49+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
50+
```
51+
52+
## HYV3Config
53+
54+
[[autodoc]] HYV3Config
55+
56+
## HYV3Model
57+
58+
[[autodoc]] HYV3Model
59+
- forward
60+
61+
## HYV3ForCausalLM
62+
63+
[[autodoc]] HYV3ForCausalLM
64+
- forward

docs/source/en/model_doc/slanet.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
<!--Copyright 2026 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
12+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13+
rendered properly in your Markdown viewer.
14+
15+
-->
16+
*This model was released on 2025-03-07 and added to Hugging Face Transformers on 2026-04-22.*
17+
18+
# SLANet
19+
20+
<div class="flex flex-wrap space-x-1">
21+
<img alt="PyTorch" src="https://img.shields.io/badge/PyTorch-DE3412?style=flat&logo=pytorch&logoColor=white">
22+
</div>
23+
24+
## Overview
25+
26+
**SLANet** and **SLANet_plus** are part of a series of dedicated lightweight models for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. For more details about the SLANet series model, please refer to the [official documentation](https://www.paddleocr.ai/latest/en/version3.x/module_usage/table_structure_recognition.html).
27+
28+
## Model Architecture
29+
30+
SLANet is a table structure recognition model developed by Baidu PaddlePaddle Vision Team. The model significantly improves the accuracy and inference speed of table structure recognition by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information.
31+
32+
## Usage
33+
34+
### Single input inference
35+
36+
The example below demonstrates how to detect text with SLANet using the [`AutoModel`].
37+
38+
<hfoptions id="usage">
39+
<hfoption id="AutoModel">
40+
41+
```py
42+
from io import BytesIO
43+
44+
import httpx
45+
from PIL import Image
46+
from transformers import AutoImageProcessor, AutoModelForTableRecognition
47+
48+
model_path="PaddlePaddle/SLANet_plus_safetensors"
49+
model = AutoModelForTableRecognition.from_pretrained(model_path, device_map="auto")
50+
image_processor = AutoImageProcessor.from_pretrained(model_path)
51+
52+
image = Image.open(BytesIO(httpx.get(image_url).content))
53+
inputs = image_processor(images=image, return_tensors="pt").to(model.device)
54+
outputs = model(**inputs)
55+
56+
results = image_processor.post_process_table_recognition(outputs)
57+
58+
print(result['structure'])
59+
print(result['structure_score'])
60+
```
61+
62+
</hfoption>
63+
</hfoptions>
64+
65+
## SLANetConfig
66+
67+
[[autodoc]] SLANetConfig
68+
69+
## SLANetForTableRecognition
70+
71+
[[autodoc]] SLANetForTableRecognition
72+
73+
## SLANetBackbone
74+
75+
[[autodoc]] SLANetBackbone
76+
77+
## SLANetSLAHead
78+
79+
[[autodoc]] SLANetSLAHead
80+

docs/source/en/serve-cli/serving.md

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ The `transformers serve` CLI is a lightweight option for local or self-hosted se
2424
The `transformers serve` command spawns a local server compatible with the [OpenAI SDK](https://platform.openai.com/docs/overview). The server works with many third-party applications and supports the REST APIs below.
2525

2626
- `/v1/chat/completions` for text, image, audio, and video requests
27+
- `/v1/completions` for legacy text completions from a freeform prompt
2728
- `/v1/responses` supports the [Responses API](https://platform.openai.com/docs/api-reference/responses)
2829
- `/v1/audio/transcriptions` for audio transcriptions
2930
- `/v1/models` lists available models for third-party integrations
@@ -959,6 +960,86 @@ The follow-up question "How many people live there?" relies on the prior context
959960
As of 2021, the population of Paris is approximately 2.2 million people.
960961
```
961962

963+
## v1/completions
964+
965+
The `v1/completions` API is based on the [legacy Completions API](https://platform.openai.com/docs/api-reference/completions). Unlike `/v1/chat/completions`, it takes a freeform text `prompt` instead of chat messages and returns generated text in `choices[].text`. This is useful for base (non-instruct) models and text completion tasks where a chat template is not needed. It also supports `suffix` for fill-in-the-middle text insertion.
966+
967+
<hfoptions id="legacy-completion">
968+
<hfoption id="curl">
969+
970+
```shell
971+
curl -X POST http://localhost:8000/v1/completions \
972+
-H "Content-Type: application/json" \
973+
-d '{
974+
"model": "Qwen/Qwen2.5-0.5B",
975+
"prompt": "The capital of France is",
976+
"max_tokens": 20
977+
}'
978+
```
979+
980+
The command returns the following response.
981+
982+
```json
983+
{
984+
"id": "chatcmpl-abc123",
985+
"object": "text_completion",
986+
"created": 1234567890,
987+
"model": "Qwen/Qwen2.5-0.5B@main",
988+
"choices": [
989+
{
990+
"text": " Paris, and the capital of the United States is Washington, D.C.",
991+
"index": 0,
992+
"logprobs": null,
993+
"finish_reason": "stop"
994+
}
995+
],
996+
"usage": {
997+
"prompt_tokens": 5,
998+
"completion_tokens": 16,
999+
"total_tokens": 21
1000+
}
1001+
}
1002+
```
1003+
1004+
</hfoption>
1005+
<hfoption id="openai">
1006+
1007+
```python
1008+
from openai import OpenAI
1009+
1010+
client = OpenAI(base_url="http://localhost:8000/v1", api_key="<random_string>")
1011+
1012+
# Non-streaming
1013+
completion = client.completions.create(
1014+
model="Qwen/Qwen2.5-0.5B",
1015+
prompt="The capital of France is",
1016+
max_tokens=20,
1017+
)
1018+
print(completion.choices[0].text)
1019+
```
1020+
1021+
The [OpenAI](https://platform.openai.com/docs/quickstart) client returns the following.
1022+
1023+
```shell
1024+
Paris, and the capital of the United States is Washington, D.C.
1025+
```
1026+
1027+
Streaming is also supported.
1028+
1029+
```python
1030+
stream = client.completions.create(
1031+
model="Qwen/Qwen2.5-0.5B",
1032+
prompt="The capital of France is",
1033+
max_tokens=20,
1034+
stream=True,
1035+
)
1036+
for chunk in stream:
1037+
print(chunk.choices[0].text, end="")
1038+
```
1039+
1040+
</hfoption>
1041+
</hfoptions>
1042+
9621043
## v1/responses
9631044

9641045
The [Responses API](https://platform.openai.com/docs/api-reference/responses) is OpenAI's latest API endpoint for generation. It supports stateful interactions and integrates built-in tools to extend a model's capabilities. OpenAI [recommends](https://platform.openai.com/docs/guides/migrate-to-responses) using the Responses API over the Chat Completions API for new projects.

setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,7 @@
124124
"rjieba",
125125
"rouge-score!=0.0.7,!=0.0.8,!=0.1,!=0.1.1",
126126
"ruff==0.14.10",
127-
"transformers-mlinter @ git+https://github.com/huggingface/transformers-mlinter@b9d319ce264c106f97a959d926ef42bc3c0ea4d1",
127+
"transformers-mlinter==0.1.0",
128128
"ty==0.0.20",
129129
# `sacrebleu` not used in `transformers`. However, it is needed in several tests, when a test calls
130130
# `evaluate.load("sacrebleu")`. This metric is used in the examples that we use to test the `Trainer` with, in the
@@ -295,7 +295,7 @@ def finalize_options(self):
295295
pass
296296

297297
def run(self):
298-
if SUPPORTED_PYTHON_VERSIONS[0] >= PYTHON_MINOR_VERSION:
298+
if SUPPORTED_PYTHON_VERSIONS[0] > PYTHON_MINOR_VERSION:
299299
print(
300300
f"Table updated only when running 3.{SUPPORTED_PYTHON_VERSIONS[0]}.x, detected version is {sys.version}."
301301
)

src/transformers/cli/serve.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,7 @@ def __init__(
8383
import uvicorn
8484

8585
from .serving.chat_completion import ChatCompletionHandler
86+
from .serving.completion import CompletionHandler
8687
from .serving.model_manager import ModelManager
8788
from .serving.response import ResponseHandler
8889
from .serving.server import build_server
@@ -131,6 +132,11 @@ def __init__(
131132
generation_state=self._generation_state,
132133
)
133134

135+
self._completion_handler = CompletionHandler(
136+
model_manager=self._model_manager,
137+
generation_state=self._generation_state,
138+
)
139+
134140
self._response_handler = ResponseHandler(
135141
model_manager=self._model_manager,
136142
generation_state=self._generation_state,
@@ -141,6 +147,7 @@ def __init__(
141147
app = build_server(
142148
self._model_manager,
143149
self._chat_handler,
150+
completion_handler=self._completion_handler,
144151
response_handler=self._response_handler,
145152
transcription_handler=self._transcription_handler,
146153
enable_cors=enable_cors,
@@ -183,6 +190,7 @@ def kill_server(self):
183190
\b
184191
Endpoints:
185192
POST /v1/chat/completions — Chat completions (streaming + non-streaming).
193+
POST /v1/completions — Legacy text completions from a prompt.
186194
GET /v1/models — Lists available models.
187195
GET /health — Health check.
188196

0 commit comments

Comments
 (0)