You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[pyTorch] Replace the make_empty implementation to use C++ implementation (#2666)
* Replace the make_empty implementation to use C++ implementation for the
known quantizers
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix lint
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Handle the device passed as string
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Fix
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fixes
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Replace the make_empty implementation to use C++ implementation for the
known quantizers
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Fix lint
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Handle the device passed as string
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Fix
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Fixes
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Fix duplicate create_empty_quantized_tensor after merge
The merge with main introduced duplicate function definition,
declaration, and pybind registration for create_empty_quantized_tensor.
Remove the duplicates.
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Fix device index resolution in create_tensor
Change the device parameter from at::Device with default torch::kCUDA
to std::optional<at::Device> with default nullopt. When no device is
specified, resolve to the current CUDA device via
c10::cuda::current_device(), ensuring the device always has a valid
index. This fixes autograd engine assertions when tensors created
without an explicit device are used in backward passes.
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Guard make_empty for custom quantizers without C++ converter
Custom quantizers that set self.custom = True and don't override
make_empty() will now get a clear NotImplementedError instead of
hitting an opaque C++ NVTE_ERROR("Unexpected type for quantizer").
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* Fix the device from the passed data case
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Signed-off-by: Przemek Tredak <ptredak@nvidia.com>
Signed-off-by: vthumbe1503 <vthumbe@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: vthumbe1503 <vthumbe@nvidia.com>
0 commit comments