DIETR

Detection and Instance sEgmentation TRansformers

DIETR is a toolbox that contains code to to train, validate and use the DIETR model, which comes in an instance segmentation DIETR-msk or an object detection DIETR-box variant.

Model	AP-0.95 - box	AP-0.95 - msk	Trainable parameters
DIETR-box	0.421	/	38,425,780
DIETR-msk	0.416	0.356	41,816,244

DIETR-msk

from dietr import DIETR

conf_pth = "__config__/00-base-msk.yaml"
file_pth = "~/data/coco/images/val2017/000000479596.jpg"
model = DIETR( 
    conf_pth=conf_pth, 
    )
result_coco : list[dict] = model.predict_on_file(file_pth, plot=True)

DIETR-box

from dietr import DIETR

conf_pth = "__config__/00-base-box.yaml"
file_pth = "coco/images/val2017/000000534827.jpg"
model = DIETR( 
    conf_pth=conf_pth, 
    )
result_coco : list[dict] = model.predict_on_file(file_pth, plot=True)

Usage

Just clone it using git.

git clone https://github.com/JPABotermans/dietr.git

And install all dependencies using uv

uv sync

Install DIETR for cuda-toolkit versions

The default instalation installs the `nvidia-cudnn-cu13` wheel. If your driver doens't support that CUDA toolkit version (check by `nvidia-smi`) you can install different version using the following commands:

uv sync --extra cu128

For 12.1

uv sync --extra cu121

And for cpu only

uv sync --extra cpu

Tuning

To fine-tune a model you need a dataset in coco format with the following and change the configurations like this, for example __config__/01-tune-msk.yaml

n_cls: #Classes
coco_dataset: False
coco_data_dir: "Path to your coco dataset"
trn_ann_file: "path to your annotations.json"
val_ann_file: "path to your annotations.json"
trn_img_root: "/train/"
val_img_root: "/valid/"

pre-trained-model: dietr-msk.pt

uv run python \
    src/dietr/trn.py \
    __config__/02-base-msk-tune.yaml \
    --device "cuda:0"

Training

Training a model from scratch on the coco dataset:

uv run python \
    src/dietr/trn.py \
    __config__/02-base-msk-tune.yaml

Validation

uv run python \
    src/dietr/val.py \
    __config__/01-base-msk-small-eval.yaml \
    --ckpt dietr-msk.pt

Results

100%|████████████████████████████████████████████████████████| 1250/1250 [07:01<00:00,  2.96it/s]
loading annotations into memory...
Done (t=0.27s)
creating index...
index created!
Loading and preparing results...
DONE (t=0.09s)
creating index...
index created!
Running per image evaluation...
Evaluate annotation type *bbox*
DONE (t=17.17s).
Accumulating evaluation results...
DONE (t=2.46s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.416
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.623
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.452
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.253
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.454
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.556
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.327
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.541
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.589
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.414
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.626
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.738
Running per image evaluation...
Evaluate annotation type *segm*
DONE (t=19.00s).
Accumulating evaluation results...
DONE (t=2.43s).
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.356
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.584
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.371
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.170
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.396
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.517
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.299
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.462
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.491
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.286
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.536
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.673

Validation - box

uv run python \
    src/dietr/val.py \
    __config__/01-base-box-small-eval.yaml \
    --ckpt dietr-box.pt

Acknowledgement

VBTI

As an AI engineer at VBTI, I have had the opportunity to work on systems where robots can cut leaves, assess part quality, and make dynamic decisions based on vision. In many of these applications, object detection and instance segmentation are essential building blocks. While they are often just one part of a much larger system, they play a key role in enabling intelligent automation.

What makes this project especially meaningful to me is that it was made possible by VBTI’s culture of innovation and trust. VBTI gave me the freedom, time, and resources to explore this idea over the course of more than a year, encouraging personal initiative and technical curiosity. Even more, the company has been supportive in allowing me to continue this work as an open-source project in my own time — something that reflects a genuine commitment to innovation, knowledge sharing, and supporting employees where possible.

I would also like to especially thank Albert van Breemen, whose creativity and mentorship were a constant source of inspiration.

BOM and TU/e supercomputer center.

Futhermore I want to acknowledge that this work was only possible due to the access granted to SPIKE-1, the supercomputing initiative from the de Brabantse Ontwikellings Maatshappij. This initiative gave me the possibility to train models on the newest DGX B200 platform. During the few months I had access to their system I could make more progress then the year before it. I espcially want to thank Hengjian Zhang, who onboarded me on the system and learned me how to work on such a state-of-the-art system.

Futher

This work was build upon RT-DETR (the head was based on their decoder), and the prototype network principle was based on the work of yoloact.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
__config__		__config__
media/readme		media/readme
src/dietr		src/dietr
tests		tests
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DIETR

Detection and Instance sEgmentation TRansformers

DIETR-msk

DIETR-box

Usage

Tuning

Training

Validation

Validation - box

Acknowledgement

VBTI

BOM and TU/e supercomputer center.

Futher

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DIETR

Detection and Instance sEgmentation TRansformers

DIETR-msk

DIETR-box

Usage

Tuning

Training

Validation

Validation - box

Acknowledgement

VBTI

BOM and TU/e supercomputer center.

Futher

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages