You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/en/model_doc/glm_moe_dsa.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ limitations under the License.
16
16
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be rendered properly in your Markdown viewer.
17
17
18
18
-->
19
-
*This model was released on {release_date} and added to Hugging Face Transformers on 2026-02-08.*
19
+
*This model was released on 2026-02-17 and added to Hugging Face Transformers on 2026-02-09.*
<!--Copyright 2026 THL A29 Limited, a Tencent company and The HuggingFace Inc. team. All rights reserved.
2
+
3
+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4
+
the License. You may obtain a copy of the License at
5
+
6
+
http://www.apache.org/licenses/LICENSE-2.0
7
+
8
+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9
+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10
+
specific language governing permissions and limitations under the License.
11
+
12
+
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
13
+
rendered properly in your Markdown viewer.
14
+
15
+
-->
16
+
*This model was released on {release_date} and added to Hugging Face Transformers on 2026-04-22.*
17
+
18
+
# Hy3-preview
19
+
20
+
## Overview
21
+
22
+
Hy3-preview is a large-scale Mixture-of-Experts (MoE) language model developed by the Tencent HunYuan team. It features a dense-MoE hybrid architecture with 192 routed experts and 1 always-active shared expert per MoE layer, achieving strong performance with efficient inference via sparse expert activation.
23
+
24
+
Key architectural features:
25
+
26
+
-**Dense-MoE hybrid**: The first layer uses a dense FFN; all subsequent layers use MoE with top-k routing (default k=8).
27
+
-**Shared experts**: Each MoE layer includes 1 shared expert that processes all tokens alongside the routed experts.
28
+
-**Sigmoid routing with expert-bias correction**: Tokens are routed via sigmoid scoring (not softmax) with a learned per-expert bias for load balancing.
29
+
-**QK-Norm**: Per-head RMSNorm applied to query and key projections before attention for improved training stability.
30
+
31
+
## Usage tips
32
+
33
+
- Load with `AutoModelForCausalLM`. The model requires multiple GPUs due to its size.
34
+
- Set `output_router_logits=True` in the config or forward call to collect per-layer MoE router logits. Note that this model does not compute an auxiliary load-balancing loss; `aux_loss` is always `None`.
35
+
- The model supports `gradient_checkpointing` to reduce memory during fine-tuning.
36
+
37
+
```python
38
+
from transformers import AutoTokenizer, AutoModelForCausalLM
**SLANet** and **SLANet_plus** are part of a series of dedicated lightweight models for table structure recognition, focusing on accurately recognizing table structures in documents and natural scenes. For more details about the SLANet series model, please refer to the [official documentation](https://www.paddleocr.ai/latest/en/version3.x/module_usage/table_structure_recognition.html).
27
+
28
+
## Model Architecture
29
+
30
+
SLANet is a table structure recognition model developed by Baidu PaddlePaddle Vision Team. The model significantly improves the accuracy and inference speed of table structure recognition by adopting a CPU-friendly lightweight backbone network PP-LCNet, a high-low-level feature fusion module CSP-PAN, and a feature decoding module SLA Head that aligns structural and positional information.
31
+
32
+
## Usage
33
+
34
+
### Single input inference
35
+
36
+
The example below demonstrates how to detect text with SLANet using the [`AutoModel`].
37
+
38
+
<hfoptionsid="usage">
39
+
<hfoptionid="AutoModel">
40
+
41
+
```py
42
+
from io import BytesIO
43
+
44
+
import httpx
45
+
fromPILimport Image
46
+
from transformers import AutoImageProcessor, AutoModelForTableRecognition
47
+
48
+
model_path="PaddlePaddle/SLANet_plus_safetensors"
49
+
model = AutoModelForTableRecognition.from_pretrained(model_path, device_map="auto")
Copy file name to clipboardExpand all lines: docs/source/en/serve-cli/serving.md
+81Lines changed: 81 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,7 @@ The `transformers serve` CLI is a lightweight option for local or self-hosted se
24
24
The `transformers serve` command spawns a local server compatible with the [OpenAI SDK](https://platform.openai.com/docs/overview). The server works with many third-party applications and supports the REST APIs below.
25
25
26
26
-`/v1/chat/completions` for text, image, audio, and video requests
27
+
-`/v1/completions` for legacy text completions from a freeform prompt
27
28
-`/v1/responses` supports the [Responses API](https://platform.openai.com/docs/api-reference/responses)
28
29
-`/v1/audio/transcriptions` for audio transcriptions
29
30
-`/v1/models` lists available models for third-party integrations
@@ -959,6 +960,86 @@ The follow-up question "How many people live there?" relies on the prior context
959
960
As of 2021, the population of Paris is approximately 2.2 million people.
960
961
```
961
962
963
+
## v1/completions
964
+
965
+
The `v1/completions` API is based on the [legacy Completions API](https://platform.openai.com/docs/api-reference/completions). Unlike `/v1/chat/completions`, it takes a freeform text `prompt` instead of chat messages and returns generated text in `choices[].text`. This is useful for base (non-instruct) models and text completion tasks where a chat template is not needed. It also supports `suffix` for fill-in-the-middle text insertion.
966
+
967
+
<hfoptionsid="legacy-completion">
968
+
<hfoptionid="curl">
969
+
970
+
```shell
971
+
curl -X POST http://localhost:8000/v1/completions \
972
+
-H "Content-Type: application/json" \
973
+
-d '{
974
+
"model": "Qwen/Qwen2.5-0.5B",
975
+
"prompt": "The capital of France is",
976
+
"max_tokens": 20
977
+
}'
978
+
```
979
+
980
+
The command returns the following response.
981
+
982
+
```json
983
+
{
984
+
"id": "chatcmpl-abc123",
985
+
"object": "text_completion",
986
+
"created": 1234567890,
987
+
"model": "Qwen/Qwen2.5-0.5B@main",
988
+
"choices": [
989
+
{
990
+
"text": " Paris, and the capital of the United States is Washington, D.C.",
The [OpenAI](https://platform.openai.com/docs/quickstart) client returns the following.
1022
+
1023
+
```shell
1024
+
Paris, and the capital of the United States is Washington, D.C.
1025
+
```
1026
+
1027
+
Streaming is also supported.
1028
+
1029
+
```python
1030
+
stream = client.completions.create(
1031
+
model="Qwen/Qwen2.5-0.5B",
1032
+
prompt="The capital of France is",
1033
+
max_tokens=20,
1034
+
stream=True,
1035
+
)
1036
+
for chunk in stream:
1037
+
print(chunk.choices[0].text, end="")
1038
+
```
1039
+
1040
+
</hfoption>
1041
+
</hfoptions>
1042
+
962
1043
## v1/responses
963
1044
964
1045
The [Responses API](https://platform.openai.com/docs/api-reference/responses) is OpenAI's latest API endpoint for generation. It supports stateful interactions and integrates built-in tools to extend a model's capabilities. OpenAI [recommends](https://platform.openai.com/docs/guides/migrate-to-responses) using the Responses API over the Chat Completions API for new projects.
0 commit comments