This repository provides sample applications for Multi-View 3D Tracking (MV3DT) with DeepStream 9.0 SDK. MV3DT is a distributed, real-time multi-view multi-target 3D tracking framework built for large-scale, calibrated camera networks. It is designed to deliver robust object tracking and identity consistency across complex environments, leveraging camera calibration data as a prerequisite for accurate geometric reasoning. The sample applications support three detector models: PeopleNet Transformer, a general-purpose people detection transformer model, PeopleNet v2.6.3, a high-performance people detection model based on DetectNet_v2, and RT-DETR 2D Warehouse, a real-time DETR model optimized for warehouse environments with multiple object classes.
This repository aims to demonstrate MV3DT through live visualization of 3D tracking results, and is structured as follows:
- Prerequisites - System requirements and setup instructions
- Option 1: Sample applications using DeepStream Container - The DeepStream Container has the DeepStream SDK pre-installed. The samples automate MV3DT config generation and launch the DeepStream app inside the container.
- Option 2: Sample applications using Inference Builder - Inference Builder is an open-source tool that automates inference pipeline generation across AI frameworks and packages them as deployable containers. The samples in this repo are solely intended to demonstrate building and running MV3DT using Inference Builder. For additional capabilities, see the Inference Builder README.
- Output Visualization Explanations - Expected visualization from DeepStream On-Screen Display (OSD) and real-time Bird's Eye View (BEV) app
- Receiving 3D Tracking Metadata from Kafka - How to consume MV3DT tracking metadata from Kafka broker for downstream applications
- Customization - How to use MV3DT on custom datasets, and how to convert existing 2D DeepStream tracking pipelines to MV3DT pipeline
As shown in the repo structure, MV3DT can be run using either DeepStream Container or Inference Builder. You can choose either approach to run the sample applications. We recommend starting with DeepStream Container for quick start and optionally trying out Inference Builder for advanced use cases, for example, integrating it with other AI frameworks or microservices.
The sample applications in this repository require Ubuntu 24.04 and NVIDIA driver version 580.xx or higher; both x86 and Jetson platforms are supported. A graphical display server (e.g., X11) is required to view visualization results. If no physical display is available, a remote desktop solution such as VNC Viewer can be used as an alternative.
On Jetson Thor, Option 2 Sample 2 (with Inference Builder, on 12-camera dataset) may hang due to file descriptor limitation in the third-party library libmosquitto. This issue is planned to be fixed in the next release.
On DGX Spark, the RT-DETR model requires TensorRT strongly-typed mode to produce valid inference outputs. Without it, detections may be missing and no bounding boxes will be shown. If you are running on DGX Spark with the RT-DETR detector, please add
strongly-typed=1to the[property]section ofconfig_templates/config_pgie_rt_detr.txtbefore launching the pipeline.
On B200, the sample apps with PeopleNetTransformer model may occasionally crash with a segmentation fault during inference. If this occurs, add
-e MALLOC_CHECK_=3to thedocker runcommand in launch scripts to mitigate the issue.
-
Please check DeepStream Container Prerequisites for DeepStream container setup.
-
Git clone the current
deepstream_reference_appsrepository to the host machine and enterdeepstream-tracker-3d-multi-viewdirectory# Install Git LFS sudo apt install git-lfs git lfs install git clone https://github.com/NVIDIA-AI-IOT/deepstream_reference_apps.git cd deepstream_reference_apps/deepstream-tracker-3d-multi-view git lfs pull # In case repo is already cloned before installing git-lfs
-
Run the automated prerequisites setup script:
The setup script takes about 10-20 minutes to complete. And it may prompt you to enter your password for sudo access and github credentials. After the initial setup, if you close the session or restart your machine, you can still use this script to set up and check prerequisites, then the completion time will be much shorter.
# For DeepStream Container only (Option 1) ./scripts/setup_prerequisites.sh # Additionally, if you want to try out Inference Builder (Option 2) USE_INFERENCE_BUILDER=true ./scripts/setup_prerequisites.sh # [Expected output]: For both options, you should see all items checked with "✓" under "PREREQUISITES CHECK SUMMARY", and in the last line you should see: # [SUCCESS] Prerequisites check passed! You're ready to use MV3DT.
Environment Variables:
USE_INFERENCE_BUILDER- Enable Inference Builder setup (default: false, DeepStream Container only)BASE_DIR- Base directory for Kafka and Inference Builder installations (default:$HOME)DEEPSTREAM_IMAGE- DeepStream Docker image (default:nvcr.io/nvidia/deepstream:9.0-triton-multiarchfor x86 and Jetson platforms)
- Use case 1: If you want to use a different base directory for Kafka and Inference Builder installations other than
$HOME, you can set theBASE_DIRenvironment variable before running the script.export BASE_DIR=/path/to/your/preferred/base/directory # If you want to use Inference Builder (Option 2), uncomment the lines below # export USE_INFERENCE_BUILDER=true # export INFERENCE_BUILDER_DIR="$BASE_DIR/inference_builder" ./scripts/setup_prerequisites.sh
- Use case 2: If you are on ARM SBSA platforms, the DeepStream docker image will be different from the default one. Please set the
DEEPSTREAM_IMAGEenvironment variable before running the script.export DEEPSTREAM_IMAGE=nvcr.io/nvidia/deepstream:9.0-triton-arm-sbsa # If you want to use Inference Builder (Option 2), uncomment the line below # export USE_INFERENCE_BUILDER=true ./scripts/setup_prerequisites.sh
For manual setup, troubleshooting, or shutdown instructions for Kafka and Mqtt brokers, see: Manual Setup Instructions
The following examples demonstrate running MV3DT using DeepStream container. Note that the configurations are auto-generated from config_templates using the auto-configurator.
Run the provided script to quickly launch the 4-camera DeepStream pipeline.
cd <path to current repo, i.e. deepstream-tracker-3d-multi-view>
sudo xhost + # give container access to display, only need to run once per session
# [expected output]: access control disabled, clients can connect from any host
# chmod +x scripts/test_4cam_ds.sh
./scripts/test_4cam_ds.sh
# To use RT-DETR detector instead of the default PeopleNetTransformer:
# DETECTOR_MODEL=RTDETR ./scripts/test_4cam_ds.sh
# To use PeopleNet v2.6.3 detector:
# DETECTOR_MODEL=PeopleNet2.6.3 ./scripts/test_4cam_ds.shTwo separate windows will be launched. One named Bird-Eye View of Multi-View 3D Tracking, and the other named DeepStreamTest5App. You may need to toggle, arrange, or resize the windows to see both views. If anything goes wrong or the windows are not showing, please follow the step-by-step instructions below; otherwise, the quick-start script covers the same processes.
Note 1 (Important): When the script is launched for the first time, it tries to create model engine files use by MV3DT, which may take about 15 minutes, depending on HW platforms. This process is only needed once for each dataset, and subsequent runs will use the generated engine files and launch immediately.
Note 2: It's expected to see the following warnings. Those warnings will not affect the accuracy or performance of the pipeline.
- Load engine failed. Create engine again.
- INT8 calibration file not specified. Trying FP16 mode.
- GStreamer-WARNING
Window 1: Bird-Eye View of Multi-View 3D Tracking
- This window shows the bird's-eye view of the multi-view 3D tracking results. It will show as blank map at the beginning.
- The BEV visualization window will not exit automatically. To close the window, select the window and press 'q'.
Window 2: DeepStreamTest5App
- This window shows the DeepStream on-screen display (OSD) of 4 camera views in a grid. This window is directly from Deepstream pipeline, and will show as black window at the beginning.
- To view a specific camera view in the DeepStream OSD window, left-click on the desired view. To return to the multi-camera grid view, simply right-click anywhere in the window.
- The DeepStreamTest5App will exit automatically. To quit the DeepStreamTest5App window early, select the window and press 'q'. After you press 'q', the app will terminate within a few seconds, and finally you will see "App run successful" printed.
- Note about window name: MV3DT is built upon DeepStreamTest5App with specific config files, which is why the window displays "DeepStreamTest5App" as its title. This base application provides essential IoT protocol support (Kafka and MQTT) required by MV3DT. Currently, the window name cannot be changed. For more details, see the DeepStreamTest5App documentation.
Expected output during engine generation period from window overview/spread:
Expected output after pipeline starts:
For detailed step-by-step instructions, see DeepStream Container: Step-by-step Instructions.
Run the provided script to quickly launch the 12-camera DeepStream pipeline. For detailed window explanations and important notes, see the 4-camera Quick Start section above.
# sudo xhost + # give container access to display
# chmod +x scripts/test_12cam_ds.sh
./scripts/test_12cam_ds.sh
# To use RT-DETR detector instead of the default PeopleNetTransformer:
# DETECTOR_MODEL=RTDETR ./scripts/test_12cam_ds.sh
# To use PeopleNet v2.6.3 detector:
# DETECTOR_MODEL=PeopleNet2.6.3 ./scripts/test_12cam_ds.shFor detailed step-by-step instructions, see DeepStream Container: Step-by-step Instructions.
Inference Builder is a tool that automatically generates inference pipelines and integrates them into either a microservice or a standalone application. In the samples in this repo, it is used to generate a Python package containing the MV3DT inference flow.
Compared to traditional DeepStream configuration files, the Inference Builder configuration (e.g., config_templates/ds_mv3dt.yaml) is significantly simpler and more concise, making it easier to define and modify.
Run the provided script to quickly start the 4-camera DeepStream Inference Builder pipeline.
# sudo xhost + # give container access to display
# If you changed BASE_DIR in the prerequisites setup, you need to export INFERENCE_BUILDER_DIR to the correct path
# export INFERENCE_BUILDER_DIR=<path-to-inference-builder-repo>
# chmod +x scripts/test_4cam_ib.sh
./scripts/test_4cam_ib.sh
# To use RT-DETR detector instead of the default PeopleNetTransformer:
# DETECTOR_MODEL=RTDETR ./scripts/test_4cam_ib.sh
# To use PeopleNet v2.6.3 detector:
# DETECTOR_MODEL=PeopleNet2.6.3 ./scripts/test_4cam_ib.shFor detailed window explanations and important notes, see the 4-camera DeepStream Quick Start section above. Additional notes:
Note 1: By default, the application waits 1000 seconds if there is no data being streamed before exiting gracefully. It is expected to see 0 FPS printed during the 1000 seconds wait time. This waiting time is controlled by the inference_timeout parameter in config_templates/ds_mv3dt.yaml. To avoid this delay:
- Option 1: Pre-generate the engine files and place them in the correct directories before starting the pipeline. Then, set
inference_timeoutto a lower value (e.g., 30 seconds). - Option 2: Use the default 1000-second timeout for the initial run to allow engine file creation. For later runs, reduce
inference_timeout(e.g., to 30 seconds) so the container exits promptly after inference completes.
Note 2: To quit the app early, run this command in a separate terminal to stop the container:
docker ps -q --filter "ancestor=inference-builder-mv3dt" | xargs docker stop.
If you are not able to see camera view or BEV view, please follow the step-by-step instructions; otherwise, the quick-start script covers the same processes.
Expected output from window overview/spread:
For detailed step-by-step instructions, see Inference Builder: Step-by-step Instructions.
Run the provided script to quickly start the 12-camera DeepStream Inference Builder pipeline.
For detailed window explanations and important notes, see the 4-camera DeepStream Quick Start section above.
# sudo xhost + # give container access to display
# If you changed BASE_DIR in the prerequisites setup, you need to export INFERENCE_BUILDER_DIR to the correct path
# export INFERENCE_BUILDER_DIR=<path-to-inference-builder-repo>
# chmod +x scripts/test_12cam_ib.sh
./scripts/test_12cam_ib.sh
# To use RT-DETR detector instead of the default PeopleNetTransformer:
# DETECTOR_MODEL=RTDETR ./scripts/test_12cam_ib.sh
# To use PeopleNet v2.6.3 detector:
# DETECTOR_MODEL=PeopleNet2.6.3 ./scripts/test_12cam_ib.shFor detailed step-by-step instructions, see Inference Builder: Step-by-step Instructions.
Whether you use Option 1 (DeepStream Container) or Option 2 (Inference Builder), both approaches launch two windows with similar visualizations. This section explains and demonstrates the content of these visualization windows.
When the pipeline is launched, DeepStream shows the output video like below while processing the input video. In the example frames below, you can see that objects detected across different cameras are assigned globally consistent IDs. And both 2D and 3D bounding boxes are visualized for each tracked object.
In the Inference Builder OSD window, object IDs are visible directly in the grid view. In the DeepStream Container OSD window, object IDs are only visible when viewing a single camera. To enter single camera view with object IDs, left-click on the desired camera view. And to return to the multi-camera grid view, simply right-click anywhere in the window.
Example 1: 4-cam dataset, with PeopleNetTransformer, Inference Builder OSD
Example 2: 12-cam dataset, with RT-DETR, Deepstream OSD
![]()
Note: a forklift is also detected in the 2nd camera view (row 1, column 2).
If you don't need on-screen display, you can disable it:
-
For DeepStream Container:
- Remove the
--enable-osdoption fromdeepstream_auto_configurator.pycommand in the quick-start script and run again.
- Remove the
-
For Inference Builder:
- Comment out the
render_configsection inconfig_templates/ds_mv3dt.yamland run the quick-start script again.
- Comment out the
The kafka_bev_visualizer.py script provides real-time bird's-eye view (BEV) visualization of 3D tracking data streamed via Kafka.
Note that the BEV visualization script should be launched before launching the MV3DT app.
-
Command:
python utils/kafka_bev_visualizer.py \ --dataset-path=$DATASET_DIR \ --msgconv-config=$EXPERIMENT_DIR/config_msgconv.txt \ --show-ids \ --average-multi-cam -
Expected outputs (left: 4-camera, right: 12-camera)
-
Note that the BEV visualization shows fused tracks, where trajectories of the same object from multiple cameras are averaged into one. If you want to see the individual trajectories from every camera, simply remove the
--average-multi-camoption from the command. The output will then look like this:
MV3DT streams tracking metadata (frame ID, sensor ID, object IDs, 3D bounding boxes, etc.) to a Kafka topic as protobuf messages. The kafka_client.py script demonstrates how to connect to the Kafka broker, deserialize the protobuf messages, and print them as JSON. For building downstream applications using MV3DT tracking metadata, this can be a reference implementation.
source mv3dt_venv/bin/activate
# Default: connects to localhost:9092, topic 'mv3dt'
python utils/kafka_client.py
# Custom broker and topic
python utils/kafka_client.py --broker localhost:9092 --topic mv3dtThis section provides customization options for the MV3DT pipeline. If you are new to DeepStream and want to try MV3DT on your own dataset, see Running MV3DT on Custom Datasets. If you already have a working 2D DeepStream tracking pipeline, see Converting your Existing 2D DeepStream Tracking Pipeline to MV3DT for simple transformation to multi-view 3D pipeline.
Requirements:
- Multi-view video streams must be synchronized.
- All video streams must have the same resolution.
- Camera calibration parameters (projection matrices) must be available.
-
Organize your dataset with the following structure:
your_dataset/ ├── videos/ │ ├── camera1.mp4 │ ├── camera2.mp4 │ └── ... ├── camInfo/ │ ├── camera1.yml │ ├── camera2.yml │ └── ... ├── map.png (optional, for BEV visualization) └── transforms.yml (optional, for BEV visualization) -
Create camera calibration files following the format of
datasets/mtmc_4cam/camInfo/Warehouse_Synthetic_Cam001.yml. Replace theprojectionMatrix_3x4_w2pvalues with your camera's projection matrix. For more details about these files, please refer to the Single-View 3D Tracking and The 3x4 Camera Projection Matrix sections of the DeepStream documentation. -
Optional: BEV visualization setup - Prepare a BEV map image and create a
transforms.ymlfile specifying the projection matrix that maps world coordinates (in meters) to BEV image coordinates, following the sample format indatasets/mtmc_4cam/transforms.yml. -
Generate configurations using the auto-configurator. Refer to
scripts/test_4cam_ds.shfor the exact python command and environment variables needed. -
Launch the MV3DT pipeline using your generated configs. Refer to
scripts/test_4cam_ds.shfor the exact Docker command and environment variables needed.
The auto-configurator generates a complete set of MV3DT config files tailored to your dataset:
- Detects dataset structure - Scans your
videos/andcamInfo/directories to determine camera count and calibration files - Generates pipeline configuration - Creates
config_deepstream.txtwith appropriate source URIs, batch sizes, and display grid layout based on detected cameras - Creates inter-camera communication configs - Generates
pub_sub_info_config_0.ymldefining camera neighbor relationships for multi-view data sharing - (Optional) Applies overrides - Uses override files to customize tracker settings for specific datasets or requirements
For detailed usage and all available options, see the Auto-Configurator Documentation.
If you have an existing 2D detection and tracking pipeline using DeepStream, the auto-configurator can automatically enhance it to support multi-view 3D tracking, provided you have camera calibration files available.
-
Organize your dataset as in the previous Running MV3DT on Custom Datasets section.
-
Create camera calibration files as in the previous section.
-
Optional: BEV visualization setup (required if you want to enable BEV visualization)
- Follow the instructions in the previous section.
-
Optional: Set up your model configs (required if using a different detector than PeopleNetTransformer)
- For example, if you are using PeopleNet as your detector, create a
PeopleNetfolder undermodelsand modifyconfig_templates/config_pgie.txtto point to your model files.
- For example, if you are using PeopleNet as your detector, create a
-
Optional: Set up your DeepStream pipeline config (required if using a custom DeepStream pipeline config)
- Modify
config_templates/config_deepstream.txtbased on your use case. For example, if you have a custom[pre-process]section, copy that section toconfig_templates/config_deepstream.txt. Note that[source%d]and[sink%d]sections will be handled by the auto-configurator.
- Modify
-
Place your tracker config file in the
config_templatesfolder. -
Generate MV3DT configs
- Use the auto-configurator with
--tracker-configargument, i.e. suppose your tracker config file is namedconfig_tracker_custom_2d.yml, pass--tracker-config=config_tracker_custom_2d.ymlto the auto-configurator. The auto-configurator assumes all template configs are located in theconfig_templatesfolder, so only the file name is needed, not the full path.
- Use the auto-configurator with
-
Launch the MV3DT pipeline using your generated configs.
-
To simulate a custom dataset, let's create a 6-camera subset (randomly selected) from the 12-camera dataset. Run the following command to generate a new dataset in
datasets/mtmc_6cam../scripts/create_6cam_subset.sh
-
Assume you have an existing 2D tracking config file. In this example, we will use the
config_tracker_2d.ymlfile in theconfig_templatesfolder. -
Set up output directories and run the auto-configurator to generate the MV3DT config files.
export DATASET_DIR=$PWD/datasets/mtmc_6cam/ export EXPERIMENT_DIR=$PWD/experiments/deepstream/6cam mkdir -p $EXPERIMENT_DIR/infer-kitti-dump mkdir -p $EXPERIMENT_DIR/tracker-kitti-dump python utils/deepstream_auto_configurator.py \ --dataset-dir=$DATASET_DIR \ --tracker-config=config_tracker_2d.yml \ --enable-msg-broker \ --enable-osd \ --output-dir=$EXPERIMENT_DIR
-
Launch the MV3DT pipeline:
export MODEL_REPO=$PWD/models docker run -t --privileged --rm --net=host --runtime=nvidia \ -v $MODEL_REPO:/workspace/models \ -v $DATASET_DIR:/workspace/inputs \ -v $EXPERIMENT_DIR:/workspace/experiments \ -v /tmp/.X11-unix/:/tmp/.X11-unix \ -e DISPLAY=$DISPLAY \ -w /workspace/experiments \ nvcr.io/nvidia/deepstream:9.0-triton-multiarch \ deepstream-test5-app -c config_deepstream.txt
-
For convenience, the same process is automated in the following scripts:
# Create the 6-camera dataset if not already done # ./scripts/create_6cam_subset.sh # Option 1: using DeepStream Container ./scripts/test_custom_2d_tracker_ds.sh # Option 2: using Inference Builder ./scripts/test_custom_2d_tracker_ib.sh
The auto-configurator generates a complete set of MV3DT config files based on your custom 2D tracker, pipeline, and model configs, as well as your dataset.
- Detects dataset structure - Scans your
videos/andcamInfo/directories to determine camera count and calibration files - Generates pipeline configuration - Uses your updated
config_templates/config_deepstream.txtas template and generates$EXPERIMENT_DIR/config_deepstream.txtwith appropriate source URIs, batch sizes, and display grid layout - Extends 2D tracker configs with MV3DT additional sections - automatically injects MV3DT sections (
ObjectModelProjectionfor 3D model projection,MultiViewAssociatorfor multi-view association, andCommunicatorfor inter-camera communication) to your 2D tracker config - Creates inter-camera communication configs - Generates
$EXPERIMENT_DIR/pub_sub_info_config_0.ymldefining camera publish/subscribe relationships for multi-view data sharing - (Optional) Applies overrides - Uses override files to customize tracker settings for specific datasets or requirements
For detailed usage and all available options, see the Auto-Configurator Documentation.
For more details on python utility scripts including auto-configuration generators and visualization tools, see 📁 Python Util Scripts Documentation