Skip to content

Commit fe7a80e

Browse files
authored
Merge branch 'OpenNMT:master' into master
2 parents 41c658c + b4e155a commit fe7a80e

22 files changed

Lines changed: 765 additions & 230 deletions

.github/workflows/ci.yml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -170,7 +170,7 @@ jobs:
170170
CIBW_MANYLINUX_X86_64_IMAGE: manylinux2014
171171
CIBW_MANYLINUX_AARCH64_IMAGE: manylinux2014
172172
CIBW_ARCHS: ${{ matrix.arch }}
173-
CIBW_SKIP: pp* *-musllinux_*
173+
CIBW_SKIP: "*-musllinux_*"
174174

175175
- name: Upload Python wheels
176176
uses: actions/upload-artifact@v4
@@ -195,10 +195,6 @@ jobs:
195195
artifact_pattern: python-wheels-Linux-aarch64
196196
wheel_pattern: "*cp310*manylinux*_aarch64.whl"
197197

198-
#- os: windows-2022
199-
# artifact_pattern: python-wheels-Windows-auto64
200-
# wheel_pattern: "*cp310*win*.whl"
201-
202198
- os: macos-15
203199
artifact_pattern: python-wheels-macOS-arm64
204200
wheel_pattern: "*cp310*macosx*arm64.whl"
@@ -226,8 +222,6 @@ jobs:
226222
- name: Install wheel
227223
shell: bash
228224
run: |
229-
ls -l
230-
find .
231225
pip install ${{ matrix.wheel_pattern }}
232226
233227
- name: Test Python wheel

CMakeLists.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -547,8 +547,9 @@ if (WITH_CUDA)
547547
list(APPEND PRIVATE_INCLUDE_DIRECTORIES ${CUDNN_INCLUDE_DIR})
548548
list(APPEND LIBRARIES ${CUDNN_LIBRARIES})
549549
add_definitions(-DCT2_WITH_CUDNN)
550+
list(APPEND SOURCES src/ops/conv1d_cudnn_gpu.cu)
550551
else()
551-
message(WARNING "cuDNN library is not enabled: convolution layers will not be supported on GPU")
552+
list(APPEND SOURCES src/ops/conv1d_gpu.cu)
552553
endif()
553554

554555
if(CUDA_DYNAMIC_LOADING)
@@ -638,7 +639,6 @@ if (WITH_CUDA)
638639
src/ops/alibi_add_gpu.cu
639640
src/ops/bias_add_gpu.cu
640641
src/ops/concat_split_slide_gpu.cu
641-
src/ops/conv1d_gpu.cu
642642
src/ops/dequantize_gpu.cu
643643
src/ops/flash_attention_gpu.cu
644644
src/ops/gather_gpu.cu

CONTRIBUTING.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,19 @@ Do you think a feature is missing or would be a great addition to the project? P
2323
* look for GitHub issues marked with the *help wanted* label: these are developments that we find particularly suited for community contributions.
2424
* If you are planning to make a large change to the existing code, consider asking first on [the forum](https://forum.opennmt.net/) to confirm that it is welcome.
2525

26+
## Contribution rules
27+
28+
CTranslate2 is a low-level, performance-critical codebase. A single misplaced pointer or inefficient memory allocation (which LLMs often get wrong) can take hours to debug.
29+
30+
To maintain code integrity and manage maintainer workload, we apply the following policy:
31+
32+
* Use of AI tools for brainstorming or minor assistance is acceptable, but contributors must explicitly disclose how AI was used and remain fully responsible for correctness, performance, and design. Submissions that appear generated without deep understanding will be declined. Verifying AI output for correctness and performance is more time-consuming than writing code manually.
33+
34+
* Mandatory Deep Understanding: Contributors must fully understand their code and be prepared to justify the purpose of part of the code base.
35+
36+
* Please contribute within your area of expertise. If you are not familiar with the core codebase, consider contributing to documentation, examples, or Hugging Face integrations.
37+
38+
2639
### Building the sources
2740

2841
See [Install from sources](https://opennmt.net/CTranslate2/installation.html#install-from-sources).
@@ -85,7 +98,7 @@ The list is ordered on 5. from the largest to smallest time.
8598

8699
#### `StorageView` class
87100

88-
CTranslate2 uses [row-major](https://en.wikipedia.org/wiki/Row-_and_column-major_order) storages, usually encapsulated in the `StorageView` class. This class acts like a tensor representation but without the mathematical semantics. It is convenience wrapper to view a buffer of data in a particular shape, and provides methods to resize, reshape, and copy data. The underlying storage has a type (e.g. `float`) and a location (e.g. GPU #1) which are both resolved at runtime.
101+
CTranslate2 uses [row-major](https://en.wikipedia.org/wiki/Row-_and_column-major_order) storages, usually encapsulated in the `StorageView` class. This class acts like a tensor representation but without the mathematical semantics. It is a convenience wrapper to view a buffer of data in a particular shape, and provides methods to resize, reshape, and copy data. The underlying storage has a type (e.g. `float`) and a location (e.g. GPU #1) which are both resolved at runtime.
89102

90103
To maximize performance, the implementation avoid new allocations when possible:
91104

@@ -144,7 +157,7 @@ To limit the size of the packages pushed to PyPI, some libraries are not include
144157

145158
One of the benefits of this dynamic loading is that multiple versions of cuBLAS and cuDNN are supported by the same binary. In particular, users can install any CUDA 12.x version as long as it provides `libcublas.so.12`.
146159

147-
The Python library only support CUDA 12.x. C++ source code is always compatible with CUDA 11, possible to use CUDA 11 libraries during compilation to create CUDA 11.x support wheel.
160+
The Python library only supports CUDA 12.x. C++ source code is always compatible with CUDA 11, possible to use CUDA 11 libraries during compilation to create CUDA 11.x support wheel.
148161

149162
### Updating other dependencies
150163

@@ -161,7 +174,7 @@ If a dependency needs an update, it is particularly important that it is updated
161174

162175
### Managing PyPI project size limit
163176

164-
Projects on PyPI have a size limit. The default limit is 10GB and [we already requested](https://github.com/pypi/support/issues/1480) an increase to 20GB in the past. Because increase requests can take several months to be accepted, we now try to work with this 20GB limit.
177+
Projects on PyPI have a size limit. The default limit is 10GB. Currently the CTranslate2 project [has 50GB](https://github.com/pypi/support/issues/8119) of storage limit.
165178

166179
So older releases need to be regularly deleted on PyPI to make room for new releases. **However, make sure to keep the latest release of each major version.**
167180

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,16 @@ Executed with 4 threads on a [*c5.2xlarge*](https://aws.amazon.com/ec2/instance-
119119

120120
Executed with CUDA 11 on a [*g5.xlarge*](https://aws.amazon.com/ec2/instance-types/g5/) Amazon EC2 instance equipped with a NVIDIA A10G GPU (driver version: 510.47.03).
121121

122+
## Contributing
123+
124+
CTranslate2 is a community-driven project. We welcome contributions of all kinds:
125+
* **New Model Support:** Help us implement more Transformer architectures.
126+
* **Performance:** Propose optimizations for CPU or GPU kernels.
127+
* **Bug Reports:** Open an issue if you find something not working as expected.
128+
* **Documentation:** Improve our guides or add new examples.
129+
130+
Check out our [Contributing Guide](CONTRIBUTING.md) to learn how to set up your development environment.
131+
122132
## Additional resources
123133

124134
* [Documentation](https://opennmt.net/CTranslate2)

include/ctranslate2/batch_reader.h

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,8 @@ namespace ctranslate2 {
5656

5757
std::vector<Example>
5858
get_next(const size_t max_batch_size,
59-
const BatchType batch_type = BatchType::Examples);
59+
const BatchType batch_type = BatchType::Examples,
60+
const bool consider_padding = false);
6061

6162
// Consumes and returns the next example.
6263
virtual Example get_next_example() = 0;
@@ -67,6 +68,12 @@ namespace ctranslate2 {
6768
}
6869

6970
private:
71+
std::vector<Example> fill_batch_with_fixed_increment(const size_t max_batch_size,
72+
const BatchType batch_type);
73+
74+
std::vector<Example> fill_batch_with_variable_increment(const size_t max_batch_size,
75+
const BatchType batch_type);
76+
7077
bool _initialized = false;
7178
Example _next;
7279
};

python/cpp/generator.cc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -234,10 +234,10 @@ namespace ctranslate2 {
234234
Arguments:
235235
start_tokens: Batch of start tokens. If the decoder starts from a special
236236
start token like ``<s>``, this token should be added to this input.
237-
max_batch_size: The maximum batch size. If the number of inputs is greater than
238-
:obj:`max_batch_size`, the inputs are sorted by length and split by chunks of
239-
:obj:`max_batch_size` examples so that the number of padding positions is
240-
minimized.
237+
max_batch_size: The maximum batch size. If the number of inputs is greater than :obj:`max_batch_size`,
238+
the inputs are sorted by length and split by chunks of :obj:`max_batch_size` examples
239+
(or tokens when :obj:`batch_type`="tokens") so that the number of padding positions
240+
is minimized.
241241
batch_type: Whether :obj:`max_batch_size` is the number of "examples" or "tokens".
242242
asynchronous: Run the generation asynchronously.
243243
beam_size: Beam size (1 for greedy search).

python/cpp/translator.cc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -372,10 +372,10 @@ namespace ctranslate2 {
372372
Arguments:
373373
source: Batch of source tokens.
374374
target_prefix: Optional batch of target prefix tokens.
375-
max_batch_size: The maximum batch size. If the number of inputs is greater than
376-
:obj:`max_batch_size`, the inputs are sorted by length and split by chunks of
377-
:obj:`max_batch_size` examples so that the number of padding positions is
378-
minimized.
375+
max_batch_size: The maximum batch size. If the number of inputs is greater than :obj:`max_batch_size`,
376+
the inputs are sorted by length and split by chunks of :obj:`max_batch_size` examples
377+
(or tokens when :obj:`batch_type`="tokens") so that the number of padding positions
378+
is minimized.
379379
batch_type: Whether :obj:`max_batch_size` is the number of "examples" or "tokens".
380380
asynchronous: Run the translation asynchronously.
381381
beam_size: Beam size (1 for greedy search).

python/ctranslate2/converters/fairseq.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,9 @@ def _load(self):
146146
import_user_module(argparse.Namespace(user_dir=self._user_dir))
147147

148148
with torch.no_grad():
149-
checkpoint = checkpoint_utils.load_checkpoint_to_cpu(self._model_path)
149+
checkpoint = torch.load(
150+
self._model_path, map_location=torch.device("cpu"), weights_only=False
151+
)
150152
args = checkpoint["args"] or checkpoint["cfg"]["model"]
151153

152154
args.data = self._data_dir

python/ctranslate2/converters/opennmt_py.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -174,7 +174,9 @@ def __init__(self, model_path: str):
174174
def _load(self):
175175
import torch
176176

177-
checkpoint = torch.load(self._model_path, map_location="cpu")
177+
checkpoint = torch.load(
178+
self._model_path, map_location="cpu", weights_only=False
179+
)
178180

179181
src_vocabs, tgt_vocabs = get_vocabs(checkpoint["vocab"])
180182

0 commit comments

Comments
 (0)