Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
b26e70f
aimet
kozlov721 Mar 23, 2026
ba56cb9
ptq
kozlov721 Mar 29, 2026
4bf1bcf
Merge branch 'main' into feature/aimet
kozlov721 Mar 29, 2026
6e506ce
fix for pickle
kozlov721 Mar 29, 2026
0e3673b
ignoring quant modules
kozlov721 Mar 29, 2026
9cd6826
ptq support
kozlov721 Mar 30, 2026
3d8828a
track
kozlov721 Mar 31, 2026
7dca296
more params
kozlov721 Mar 31, 2026
58565e4
aimet config
kozlov721 Mar 31, 2026
40518c0
requirements
kozlov721 Mar 31, 2026
fa33a8d
serialization
kozlov721 Apr 1, 2026
c051144
more ptq techniques
kozlov721 Apr 1, 2026
4923c8e
small fixes
kozlov721 Apr 2, 2026
cb07292
fix keypoint loss
kozlov721 Apr 5, 2026
9dd61d8
optimizer
kozlov721 Apr 5, 2026
d45abc4
config options
kozlov721 Apr 5, 2026
22c98a1
loading config values
kozlov721 Apr 7, 2026
eafbcf9
cleaned up
kozlov721 Apr 7, 2026
59936ec
docs
kozlov721 Apr 7, 2026
ea07550
test
kozlov721 Apr 7, 2026
ea3f8cd
fix tests
kozlov721 Apr 7, 2026
2716b1e
fixes
kozlov721 Apr 7, 2026
9bccbc8
fixed config
kozlov721 Apr 7, 2026
d200891
updated api
kozlov721 Apr 7, 2026
d34ca91
cleanup
kozlov721 Apr 7, 2026
1140c36
cli fix
kozlov721 Apr 7, 2026
fa5c934
scheduler step remove epoch
kozlov721 Apr 7, 2026
5180a5e
fix qat
kozlov721 Apr 7, 2026
87f98b5
remmoved hidden node
kozlov721 Apr 8, 2026
ffa5bda
device switching
kozlov721 Apr 8, 2026
c359722
docs
kozlov721 Apr 8, 2026
b77c126
renamed
kozlov721 Apr 8, 2026
068d8f1
removed ignores
kozlov721 Apr 8, 2026
5769e46
updated tests
kozlov721 Apr 8, 2026
fb72ec6
affine quant
kozlov721 Apr 8, 2026
1e95e2a
reordered
kozlov721 Apr 8, 2026
5c07a31
Merge branch 'main' into feature/aimet
kozlov721 Apr 8, 2026
2476ef7
fix types
kozlov721 Apr 8, 2026
faef813
fix config test
kozlov721 Apr 8, 2026
51f897d
requirements
kozlov721 Apr 8, 2026
5f5ddfa
removed quant ignore
kozlov721 Apr 8, 2026
4c0d8d6
simplify
kozlov721 Apr 8, 2026
b786595
fix readme
kozlov721 Apr 8, 2026
8dffdec
fixed optimizer
kozlov721 Apr 8, 2026
5b09694
req
kozlov721 Apr 8, 2026
67953c4
separated requerements
kozlov721 Apr 8, 2026
8ff59a6
updated ci
kozlov721 Apr 8, 2026
d249b57
readme update
kozlov721 Apr 8, 2026
1b3b8c1
fix test
kozlov721 Apr 8, 2026
bb379ff
helper
kozlov721 Apr 9, 2026
6476482
required aimet fields
kozlov721 Apr 9, 2026
34c0fdd
fix test
kozlov721 Apr 13, 2026
ad3e379
docs
kozlov721 Apr 13, 2026
bed8285
Merge branch 'main' into feature/aimet
kozlov721 Apr 13, 2026
06a6c2c
fix prediction
kozlov721 Apr 13, 2026
fc73aec
fix test
kozlov721 Apr 13, 2026
96f42da
fix keypoint and anomaly
kozlov721 Apr 13, 2026
d5fefe5
fix deepcopy
kozlov721 Apr 13, 2026
5993a6f
seq mse
kozlov721 Apr 13, 2026
24ac150
fix tests
kozlov721 Apr 13, 2026
478046c
updated import
kozlov721 Apr 13, 2026
3fc8cb4
fix test
kozlov721 Apr 13, 2026
e7b4a6e
updated svg
kozlov721 Apr 14, 2026
4b74f04
Merge branch 'main' into feature/aimet
kozlov721 May 12, 2026
a5558a4
fix test
kozlov721 May 12, 2026
d14f285
fix
kozlov721 May 12, 2026
70f6675
fix readme
kozlov721 May 19, 2026
ed6ed80
fix link
kozlov721 May 19, 2026
1d110ce
update
kozlov721 May 19, 2026
c06d002
removed requirement
kozlov721 May 19, 2026
0162f88
fix tests
kozlov721 May 20, 2026
bb7f28c
change requirements
kozlov721 May 21, 2026
25656ab
fix test
kozlov721 May 21, 2026
7bcd04b
fix readme
kozlov721 May 21, 2026
c309012
moved outside of loop
kozlov721 May 21, 2026
a1fb63b
loading weights before eval
kozlov721 May 21, 2026
4dc9884
safer values
kozlov721 May 21, 2026
70faf84
fix docs
kozlov721 May 21, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ jobs:
cache: pip

- name: Install dependencies
run: pip install -e .[dev]
run: pip install -e .[dev,aimet] --extra-index-url https://download.pytorch.org/whl/cu130

- name: Install dev version of LuxonisML
if: startsWith(github.head_ref, 'release/') == false
Expand Down Expand Up @@ -147,7 +147,7 @@ jobs:
cache: pip

- name: Install dependencies
run: pip install -e .[dev]
run: pip install -e .[dev,aimet] --extra-index-url https://download.pytorch.org/whl/cu130

- name: Install dev version of LuxonisML
if: startsWith(github.head_ref, 'release/') == false
Expand Down
10 changes: 10 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,15 @@ pip install luxonis-train

This will also install the `luxonis_train` CLI. For more information on how to use it, see [CLI Usage](#cli).

### AIMET Quantization Support

To enable support for AIMET quantization, install the `luxonis-train[aimet]` extra:

```bash
pip install luxonis-train[aimet] --extra-index-url https://download.pytorch.org/whl/cu130

```

<a name="usage"></a>

## 📝 Usage
Expand All @@ -135,6 +144,7 @@ The CLI is the most straightforward way how to use `LuxonisTrain`. The CLI provi
- `tune` - Tune the hyperparameters of the model for better performance
- `inspect` - Inspect the dataset you are using and visualize the annotations
- `annotate` - Annotate a directory using the model’s predictions and generate a new LDF.
- `quantize` - Quantize the model using `AIMET` quantization techniques

**To get help on any command:**

Expand Down
36 changes: 36 additions & 0 deletions configs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -510,6 +510,7 @@ Here you can define configuration for exporting.
| `onnx` | `dict` | `{}` | Options specific for ONNX export. See [ONNX](#onnx) section for details |
| `hubai` | `dict` | `{}` | Options for HubAI SDK conversion. See [HubAI](#hubai) section for details |
| `blobconverter` | `dict` | `{}` | Options for converting to BLOB format (deprecated). See [Blob](#blob-deprecated) section |
| `aimet` | `dict` | `{}` | Options for AIMET quantization. See [AIMET](#aimet) |

### `ONNX`

Expand Down Expand Up @@ -571,6 +572,41 @@ exporter:
shaves: 8
```

### `AIMET`

The [AIMET](https://quic.github.io/aimet-pages/releases/latest/index.html) (AI Model Efficiency Toolkit) provides quantization and model export tools.

| Key | Type | Default value | Description |
| -------------------------- | ------------------------------------------------- | -------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `active` | `bool` | `False` | Whether to use AIMET for quantization and export |
| `epochs` | `int` | `20` | Number of epochs to use for quantization-aware training |
| `default_output_bw` | `int` | `8` | Default bitwidth for quantized activations and weights |
| `default_param_bw` | `int` | `8` | Default bitwidth for quantized parameters |
| `default_data_type` | `Literal["int", "float"]` | `int` | Default data type for quantized values |
| `quant_scheme` | `Literal["min_max", "post_training_tf_enhanced"]` | `min_max` | Quantization scheme to use |
| `config` | `dict \| str` | `{}` | Additional configuration for AIMET. Can be a dictionary or a path to a JSON config file. Refer to the [AIMET documentation](https://quic.github.io/aimet-pages/releases/latest/techniques/runtime_config.html) for details on the available options. |
| `fold_batch_norms` | `bool` | `False` | Whether to fold batch normalization layers before quantization |
| `cross_layer_equalization` | `bool` | `False` | Whether to perform cross-layer equalization before quantization |
| `batch_norm_reestimation` | `bool` | `False` | Whether to perform batch norm re-estimation after quantization |
| `sequential_mse` | `bool` | `False` | Whether to perform sequential MSE optimization. |
| `optimizer` | `dict` | `{"name": "SGD", "params": {"lr": 1e-5}}` | Optimizer configuration for quantization-aware training. See [Optimizer](#optimizer) section for details and examples. |
| `scheduler` | `dict` | `{"name": "StepLR", "params": {"step_size": 5, "gamma": 0.1}}` | Scheduler configuration for quantization-aware training. See [Scheduler](#scheduler) section for details and examples.. |
| `adaround` | `dict` | `{}` | Configuration for Adaround weight rounding. See [Adaround](#adaround) for more details. |

#### Adaround

Adaptive rounding (AdaRound) is a rounding mechanism for model weights designed to adapt to the data to improve the accuracy of the quantized model.

By default, AIMET uses nearest rounding for quantization, in which weight values are quantized to the nearest integer value. AdaRound, however, uses training data to determine how to round quantized weights. This technique often improves the accuracy of the quantized model.

| Key | Type | Default value | Description |
| ------------------------ | ----------------- | ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `active` | `bool` | `False` | Whether to use AdaRound for weight rounding during quantization |
| `default_num_iterations` | `int \| None` | `None` | Number of iterations for the AdaRound optimization. The default value is 10K for models with 8- or higher bit weights, and 15K for models with lower than 8 bit weights. |
| `default_reg_param` | `float` | `0.01` | Regularization parameter, trading off between rounding loss vs reconstruction loss. |
| `default_beta_range` | `tuple[int, int]` | `(20, 2)` | Start and stop beta parameter for annealing of rounding loss (start_beta, end_beta). |
| `default_warm_start` | `float` | `0.2` | The warm up period, during which rounding loss has zero effect. |

## Tuner

Here you can specify options for tuning.
Expand Down
25 changes: 25 additions & 0 deletions luxonis_train/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -154,6 +154,8 @@ def get_visualization_item(
return np_images, np_labels

images, labels = loader[idx]
if not isinstance(images, dict):
images = {loader.image_source: images}
return (
{
name: image.numpy().transpose(1, 2, 0)
Expand Down Expand Up @@ -480,6 +482,29 @@ def convert(
).convert(save_dir=save_dir, weights=weights)


@app.command(group=export_group, sort_key=1)
def quantize(
opts: list[str] | None = None,
/,
*,
config: str | None = None,
weights: str | None = None,
):
"""Quantize the model using AIMET.

@type config: str
@param config: Path to the configuration file.
@type weights: str
@param weights: Path to the model weights.
@type opts: list[str]
@param opts: A list of optional CLI overrides of the config file.
"""
model = create_model(
config, opts, weights=weights, allow_empty_dataset=True
)
model.quantize()


@upgrade_app.command()
def config(
config: Annotated[
Expand Down
51 changes: 35 additions & 16 deletions luxonis_train/attached_modules/losses/adaptive_detection_loss.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ class AdaptiveDetectionLoss(BaseLoss):
n_anchors_list: list[int]
stride_tensor: Tensor
gt_bboxes_scale: Tensor
anchor_points_strided: Tensor

def __init__(
self,
Expand Down Expand Up @@ -102,6 +103,19 @@ def __init__(
self.class_loss_weight = class_loss_weight
self.iou_loss_weight = iou_loss_weight

self.register_buffer(
"gt_bboxes_scale",
torch.tensor(
[
self.original_img_size[1],
self.original_img_size[0],
self.original_img_size[1],
self.original_img_size[0],
],
),
persistent=False,
)

self._logged_assigner_change = False

def forward(
Expand Down Expand Up @@ -163,30 +177,35 @@ def forward(
return loss, sub_losses

def _init_parameters(self, features: list[Tensor]) -> None:
if not hasattr(self, "gt_bboxes_scale"):
self.gt_bboxes_scale = torch.tensor(
[
self.original_img_size[1],
self.original_img_size[0],
self.original_img_size[1],
self.original_img_size[0],
],
device=features[0].device,
)
if not hasattr(self, "anchors"):
(
self.anchors,
self.anchor_points,
self.n_anchors_list,
self.stride_tensor,
anchors,
anchor_points,
n_anchors_list,
stride_tensor,
) = anchors_for_fpn_features(
features,
self.stride,
self.grid_cell_size,
self.grid_cell_offset,
multiply_with_stride=True,
)
self.anchor_points_strided = (
self.anchor_points / self.stride_tensor
self.register_buffer("anchors", anchors, persistent=False)
self.register_buffer(
"anchor_points", anchor_points, persistent=False
)
self.register_buffer(
"n_anchors_list",
torch.tensor(n_anchors_list),
persistent=False,
)
self.register_buffer(
"stride_tensor", stride_tensor, persistent=False
)
self.register_buffer(
"anchor_points_strided",
anchor_points / stride_tensor,
persistent=False,
)

def _run_assigner(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
from luxonis_train.utils.boundingbox import IoUType
from luxonis_train.utils.keypoints import insert_class

from .bce_with_logits import BCEWithLogitsLoss


class EfficientKeypointBBoxLoss(AdaptiveDetectionLoss):
node: EfficientKeypointBBoxHead
Expand Down Expand Up @@ -74,9 +72,7 @@ def __init__(
**kwargs,
)

self.b_cross_entropy = BCEWithLogitsLoss(
pos_weight=torch.tensor([viz_pw])
)
self.pos_weight = torch.tensor([viz_pw])
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping a reference to another BaseLoss that is not attached to a specific node is problematic when copying the model. In this case it's easier to use F.binary_cross_entropy_with_logits directly instead of our BSEWithLogitsLoss.

self.sigmas = get_sigmas(
sigmas=sigmas, n_keypoints=self.n_keypoints, caller_name=self.name
)
Expand All @@ -85,6 +81,13 @@ def __init__(
)
self.regr_kpts_loss_weight = regr_kpts_loss_weight
self.vis_kpts_loss_weight = vis_kpts_loss_weight
self.register_buffer(
"gt_kpts_scale",
torch.tensor(
[self.original_img_size[1], self.original_img_size[0]],
),
persistent=False,
)

def forward(
self,
Expand All @@ -95,14 +98,14 @@ def forward(
target_boundingbox: Tensor,
target_keypoints: Tensor,
) -> tuple[Tensor, dict[str, Tensor]]:
self._init_parameters(features)

device = keypoints_raw.device
target_keypoints = insert_class(target_keypoints, target_boundingbox)

batch_size = class_scores.shape[0]
n_kpts = (target_keypoints.shape[1] - 2) // 3

self._init_parameters(features)

pred_bboxes = dist2bbox(distributions, self.anchor_points_strided)
keypoints_raw = self.dist2kpts_noscale(
self.anchor_points_strided,
Expand All @@ -124,7 +127,7 @@ def forward(
scaled_raw_keypoints = keypoints_raw.clone()
scaled_raw_keypoints[..., :2] = scaled_raw_keypoints[
..., :2
] * self.stride_tensor.view(1, -1, 1, 1)
] * self.stride_tensor.clone().view(1, -1, 1, 1)

sigmas = self.sigmas.to(device)

Expand Down Expand Up @@ -190,8 +193,11 @@ def forward(
regression_loss = (
((1 - torch.exp(-e)) * mask).sum(dim=1) / (mask.sum(dim=1) + 1e-9)
).mean()
visibility_loss = self.b_cross_entropy.forward(
keypoints_raw[..., 2], mask

visibility_loss = F.binary_cross_entropy_with_logits(
keypoints_raw[..., 2],
mask,
pos_weight=self.pos_weight.clone().to(device),
)

one_hot_label = F.one_hot(assigned_labels.long(), self.n_classes + 1)[
Expand Down Expand Up @@ -264,12 +270,3 @@ def dist2kpts_noscale(self, anchor_points: Tensor, kpts: Tensor) -> Tensor:
adj_kpts[..., 0] += x_adj
adj_kpts[..., 1] += y_adj
return adj_kpts

def _init_parameters(self, features: list[Tensor]) -> None:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters created this way were causing weird errors during quantization because the tensors were created during inference_mode.

if hasattr(self, "gt_kpts_scale"):
return
super()._init_parameters(features)
self.gt_kpts_scale = torch.tensor(
[self.original_img_size[1], self.original_img_size[0]],
device=features[0].device,
)
Original file line number Diff line number Diff line change
Expand Up @@ -92,7 +92,7 @@ def forward(self, img1: Tensor, img2: Tensor) -> Tensor:

(_, channel, _, _) = img1.size()
if channel == self.channel and self.window.dtype == img1.dtype:
window = self.window.to(device)
window = self.window.to(device).clone()
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloning is another way how to fix the "tensors created in inference mode" issue.

else:
window = (
create_window(self.window_size, channel)
Expand Down
6 changes: 6 additions & 0 deletions luxonis_train/attached_modules/metrics/base_metric.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,12 @@ def compute(
"""
return super().compute()

def __eq__(self, other: object) -> bool:
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes an issue with fold_all_batch_norms.

torchmetrics.Metric supports chaining individual metrics into larger pipelines using overloaded math operators.

For example you can do:

from torchmetrics import Precision, Recall

precision = Precision(task="binary")
recall = Recall(task="binary")

# Operator overloading on the classes
f1_score = 2 * (precision * recall) / (precision + recall)

The f1_score works the same as the official torchmetrics.F1Score, but it was created by pipelining smaller metrics.

from torchmetrics import F1Score

official_f1_score = F1Score(task="binary")

f1_score.update(
    tensor([0, 1, 1, 0, 0, 1]), tensor([0, 1, 0, 1, 1, 1])
)
print(f"F1 Score: {f1_score.compute()}")
# F1 Score: 0.5714

official_f1_score.update(
    tensor([0, 1, 1, 0, 0, 1]), tensor([0, 1, 0, 1, 1, 1])
)
print(f"Official F1 Score: {official_f1_score.compute()}")
# Official F1 Score: 0.5714

print(f1_score)
# CompositionalMetric(
#   true_divide(
#     CompositionalMetric(
#       mul(
#         2,
#         CompositionalMetric(
#           mul(
#             BinaryPrecision(),
#             BinaryRecall()
#           )
#         )
#       )
#     ),
#     CompositionalMetric(
#       add(
#         BinaryPrecision(),
#         BinaryRecall()
#       )
#     )
#   )
# )

This is cool but it works for == and != as well and for combinations of torchmetrics.Metric and other types:

foo = precision != 2
print(foo)
# CompositionalMetric(
#   ne(
#     BinaryPrecision(),
#     2
#   )
# )

This has one major disadvantage:

foo = precision != 2
print(bool(foo))
# True

if precision == None:
    print("This should not happen")
# This should not happen

This breaks AIMET which expects comparisons to work, so in order to fix it we have to re-implement __eq__ (and __hash__) ourselves.

return self is other

def __hash__(self) -> int:
return id(self)

@cached_property
def _signature(self) -> dict[str, Parameter]:
return get_signature(self.update)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,9 @@ def compute(self) -> dict[str, Tensor]:
}

def _update(self, predictions: list[Tensor], targets: Tensor) -> None:
if self.confusion_matrix.is_inference():
self.confusion_matrix = self.confusion_matrix.clone()

for pred, target in zip(
predictions,
instances_from_batch(targets, batch_size=len(predictions)),
Expand Down
14 changes: 13 additions & 1 deletion luxonis_train/attached_modules/visualizers/base_visualizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@
from inspect import Parameter

import torch.nn.functional as F
from luxonis_ml.data.utils import ColorMap
from torch import Tensor
from typing_extensions import TypeVarTuple, Unpack
from typing_extensions import TypeVarTuple, Unpack, override

from luxonis_train.attached_modules import BaseAttachedModule
from luxonis_train.registry import VISUALIZERS
Expand All @@ -25,6 +26,13 @@ def __init__(self, *args, scale: float = 1.0, **kwargs) -> None:
super().__init__(*args, **kwargs)
self.scale = scale

@override
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColorMap internally uses a generator so it cannot be pickled. Before pickling we remove it from the instance state.

def __getstate__(self) -> dict:
state = super().__getstate__()
if "colormap" in state:
del state["colormap"]
return state

@staticmethod
def scale_canvas(canvas: Tensor, scale: float = 1.0) -> Tensor:
return F.interpolate(
Expand All @@ -34,6 +42,10 @@ def scale_canvas(canvas: Tensor, scale: float = 1.0) -> Tensor:
align_corners=False,
)

@cached_property
def colormap(self) -> ColorMap:
return ColorMap()

@abstractmethod
def forward(
self,
Expand Down
Loading
Loading