Skip to content

Commit 41c658c

Browse files
authored
Merge branch 'OpenNMT:master' into master
2 parents c937f94 + 1251f7c commit 41c658c

13 files changed

Lines changed: 515 additions & 15 deletions

File tree

.github/workflows/ci.yml

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ jobs:
109109
-DBUILD_TESTS=ON \
110110
.
111111
make -j $(nproc) install
112-
112+
113113
- name: Build Ruy
114114
if: matrix.backend == 'ruy'
115115
run: |
@@ -121,7 +121,7 @@ jobs:
121121
$CMAKE_EXTRA_OPTIONS \
122122
.
123123
make -j $(nproc) install
124-
124+
125125
- name: Download test data
126126
run: |
127127
wget https://opennmt-models.s3.amazonaws.com/transliteration-aren-all.tar.gz
@@ -229,7 +229,7 @@ jobs:
229229
ls -l
230230
find .
231231
pip install ${{ matrix.wheel_pattern }}
232-
232+
233233
- name: Test Python wheel
234234
run: |
235235
pytest -v python/tests/ --ignore=python/tests/test_opennmt_tf.py
@@ -295,6 +295,29 @@ jobs:
295295
with:
296296
submodules: recursive
297297

298+
- name: Show disk and docker usage (before cleanup)
299+
run: |
300+
df -h
301+
echo " -= Docker System =-"
302+
docker system df || true
303+
304+
- name: Free disk space (cleanup heavy preinstalled directories + docker prune)
305+
run: |
306+
echo " -= Removing big preinstalled directories (shouldn't remove the needed tools) =-"
307+
sudo rm -rf /opt/hostedtoolcache || true
308+
sudo rm -rf /usr/share/dotnet || true
309+
sudo rm -rf /usr/lib/jvm || true
310+
sudo rm -rf /usr/local/lib/android || true
311+
echo " -= Running docker prune =-"
312+
docker system prune -af --volumes || true
313+
docker builder prune -af || true
314+
315+
- name: Show disk and docker usage (after cleanup)
316+
run: |
317+
df -h
318+
echo " -= Docker System =-"
319+
docker system df || true
320+
298321
- name: Build Docker images
299322
run: |
300323
./docker/build_all.sh

CHANGELOG.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,20 @@
44

55
### Fixes and improvements
66

7-
## [v4.6.1](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.1) (2025-10-07)
7+
## [v4.6.2](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.2) (2025-12-05)
8+
9+
### New features
10+
11+
* Qwen 3 support (#1943) by [@jordimas](https://github.com/jordimas)
12+
* Gemma 3 text support (#1936) by [@jordimas](https://github.com/jordimas)
13+
14+
### Fixes and improvements
15+
16+
* Fixed pkg_resources Deprecated Warning (#1911) by [@thawancomt](https://github.com/thawancomt)
17+
* Disable INT8 for sm120 - Blackwell GPUs (#1937) by [@Purfview](https://github.com/Purfview)
18+
* FIX: package libctranslate2.so in wheel to avoid build fail (#1920) by [@yzewei](https://github.com/yzewei)
19+
20+
## [v4.6.1](https://github.com/OpenNMT/CTranslate2/releases/tag/v4.6.1) (2025-11-07)
821

922
### New features
1023

docker/Dockerfile

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM nvidia/cuda:12.2.2-cudnn8-devel-ubuntu22.04 as builder
1+
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04 as builder
22

33
RUN apt-get update && \
44
apt-get install -y --no-install-recommends \
@@ -77,21 +77,21 @@ RUN cd python && \
7777
python3 -m pip --no-cache-dir install -r install_requirements.txt && \
7878
python3 setup.py bdist_wheel --dist-dir $CTRANSLATE2_ROOT
7979

80-
FROM nvidia/cuda:12.2.2-base-ubuntu22.04
80+
FROM nvidia/cuda:12.4.1-base-ubuntu22.04
8181

8282
# We remove the cuda-compat package because it conflicts with the CUDA Enhanced Compatibility.
8383
# See e.g. https://github.com/NVIDIA/nvidia-docker/issues/1515
8484
RUN apt-get update && \
8585
apt-get install -y --no-install-recommends \
86-
libcublas-12-2 \
87-
libcudnn8=8.9.7.29-1+cuda12.2 \
88-
libnccl2=2.19.3-1+cuda12.2 \
86+
libcublas-12-4 \
87+
libcudnn9-cuda-12 \
88+
libnccl2 \
8989
libopenmpi3=4.1.2-2ubuntu1 \
9090
openmpi-bin \
9191
libgomp1 \
9292
python3-pip \
9393
&& \
94-
apt-get purge -y cuda-compat-12-2 && \
94+
apt-get purge -y cuda-compat-12-4 && \
9595
apt-get clean && \
9696
rm -rf /var/lib/apt/lists/*
9797

docs/guides/transformers.md

Lines changed: 80 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,8 @@ CTranslate2 supports selected models from Hugging Face's [Transformers](https://
88
* CodeGen
99
* DistilBERT
1010
* Falcon
11+
* Gemma 2
12+
* Gemma 3 (text only)
1113
* Llama
1214
* M2M100
1315
* MarianMT
@@ -20,6 +22,8 @@ CTranslate2 supports selected models from Hugging Face's [Transformers](https://
2022
* GPT-NeoX
2123
* OPT
2224
* Pegasus
25+
* Qwen 2.5
26+
* Qwen 3
2327
* T5
2428
* Whisper
2529
* XLM-RoBERTa
@@ -80,7 +84,7 @@ print(tokenizer.decode(tokenizer.convert_tokens_to_ids(target), skip_special_tok
8084

8185
## BERT
8286

83-
[BERT](https://huggingface.co/docs/transformers/model_doc/bert) is pretrained model on English language using a masked language modeling objective.
87+
[BERT](https://huggingface.co/docs/transformers/model_doc/bert) is a pretrained model on English language using a masked language modeling objective.
8488

8589
CTranslate2 only implements the `BertModel` class from Transformers which includes the Transformer encoder and the pooling layer. Task-specific layers should be run with PyTorch as shown in the example below.
8690

@@ -183,6 +187,43 @@ output = tokenizer.decode(results[0].sequences_ids[0])
183187
print(output)
184188
```
185189

190+
## Gemma 3 (text only)
191+
192+
193+
[Gemma 3](https://ai.google.dev/gemma/docs/core) is Google's latest family of lightweight, open-weight AI models, built on the same technology as Gemini.
194+
195+
Gemma models come in two flavors: instruction tuned (it) models and base models.
196+
197+
Instruction tuned models expect a specific [prompt template format](https://ai.google.dev/gemma/docs/core/prompt-structure) which you should use.
198+
199+
When converting an instruction-tuned model, CTranslate sets `<end_of_turn>` as the default end-of-sequence token.
200+
201+
202+
To convert a model:
203+
204+
```bash
205+
ct2-transformers-converter --model google/gemma-3-1b-it --output_dir gemma-3-1b-it
206+
```
207+
208+
Gemma 3 usage sample:
209+
210+
211+
```python
212+
213+
from transformers import AutoTokenizer
214+
import ctranslate2
215+
216+
tok = AutoTokenizer.from_pretrained("google/gemma-3-1b-it")
217+
gen = ctranslate2.Generator("gemma-3-1b-it")
218+
219+
prompt = "<start_of_turn>user\nGenerate a 200 word text talking about George Orwell.<end_of_turn>\n<start_of_turn>model\n"
220+
tokens = tok.convert_ids_to_tokens(tok.encode(prompt))
221+
222+
res = gen.generate_batch([tokens], max_length=2048, sampling_temperature=0.1, include_prompt_in_result=False)
223+
print(tok.convert_tokens_to_string(res[0].sequences[0]))
224+
```
225+
226+
186227
## Llama 2
187228

188229
[Llama 2](https://ai.meta.com/llama/) is a collection of pretrained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters.
@@ -446,6 +487,44 @@ output = tokenizer.decode(results[0].sequences_ids[0])
446487
print(output)
447488
```
448489

490+
## Qwen 3
491+
492+
[Qwen 3](https://github.com/QwenLM/Qwen3) are a collection of large language models developed by the Alibaba Group. A key feature is allows switching between "thinking mode" for complex reasoning and a "non-thinking mode" for efficient general chat.
493+
494+
To convert a model:
495+
496+
```bash
497+
ct2-transformers-converter --model Qwen/Qwen3-4B --quantization float16 --output_dir qwen3-4b-ct2
498+
```
499+
500+
Usage Sample
501+
502+
You can use the converted model for text generation with ctranslate2.Generator. For Qwen 3 instruction-tuned models, you should use the Hugging Face tokenizer's apply_chat_template method to correctly format your prompts, especially when dealing with the optional "thinking mode". Currently MoE models variants are not supported.
503+
504+
```python
505+
import ctranslate2
506+
import transformers
507+
508+
generator = ctranslate2.Generator("qwen3-4b-ct2")
509+
tokenizer = transformers.AutoTokenizer.from_pretrained("Qwen/Qwen3-4B")
510+
511+
def generate(prompt):
512+
tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt, add_special_tokens=False))
513+
results = generator.generate_batch([tokens], max_length=2048, sampling_temperature=0.7, include_prompt_in_result=False)
514+
return tokenizer.decode(results[0].sequences_ids[0])
515+
516+
prompt_base = """<|im_start|>user
517+
A train leaves Station A at 60 mph heading towards Station B, 300 miles away. At the same time, another train leaves Station B at 40 mph heading towards Station A. When will they meet and how far from Station A?
518+
<|im_end|>
519+
<|im_start|>assistant"""
520+
521+
print("Non-thinking:\n" + "-"*60)
522+
print(generate(prompt_base + "\n<think></think>\n"))
523+
524+
print("\nThinking:\n" + "="*60)
525+
print(generate(prompt_base))
526+
```
527+
449528
## T5
450529

451530
[T5](https://huggingface.co/docs/transformers/model_doc/t5) is an encoder-decoder model pre-trained on a multi-task mixture of unsupervised and supervised tasks and for which each task is converted into a text-to-text format.

include/ctranslate2/layers/attention.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
#include "ctranslate2/layers/attention_layer.h"
44
#include "ctranslate2/padder.h"
5+
#include "ctranslate2/layers/transformer.h"
56

67
namespace ctranslate2 {
78
namespace layers {
@@ -65,6 +66,8 @@ namespace ctranslate2 {
6566
dim_t _relative_right_max_position;
6667
const bool _merge_time_and_head_dims;
6768
const dim_t _cache_time_dim;
69+
std::unique_ptr<const LayerNorm> _q_norm; // Query normalization
70+
std::unique_ptr<const LayerNorm> _k_norm; // Key normalization
6871
};
6972
}
7073
}

python/ctranslate2/__init__.py

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,18 @@
55
import glob
66
import os
77

8-
import pkg_resources
9-
108
module_name = sys.modules[__name__].__name__
11-
package_dir = pkg_resources.resource_filename(module_name, "")
9+
10+
# Adressing python 3.9 < version
11+
try:
12+
from importlib.resources import files
13+
14+
# Fixed the pkg_resources depreciation
15+
package_dir = str(files(module_name))
16+
except ImportError:
17+
import pkg_resources
18+
19+
package_dir = pkg_resources.resource_filename(module_name, "")
1220

1321
add_dll_directory = getattr(os, "add_dll_directory", None)
1422
if add_dll_directory is not None:

0 commit comments

Comments
 (0)