FishBEVOD: Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras

Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras
Xiangzhong Liu, Hao Shen
fortiss GmbH, Munich, Germany
ICRA 2026

Abstract

Modern autonomous driving systems increasingly rely on mixed camera configurations with pinhole and fisheye cameras for full-view perception. However, Bird's-Eye View (BEV) 3D object detection models are predominantly designed for pinhole cameras, leading to performance degradation under fisheye distortion. To bridge this gap, we introduce a multi-view BEV detection benchmark with mixed cameras by converting KITTI-360 into nuScenes format. Our study encompasses three adaptations: rectification for zero-shot evaluation and fine-tuning of nuScenes-trained models, distortion-aware view transformation modules (VTMs) via the MEI camera model, and polar coordinate representations to better align with radial distortion. We systematically evaluate three representative BEV architectures—BEVFormer, BEVDet and PETR—across these strategies. We demonstrate that projection-free architectures are inherently more robust and effective against fisheye distortion. This work establishes the first real-data 3D detection benchmark with fisheye and pinhole images and provides systematic adaptation and practical guidelines for designing robust and cost-effective 3D perception systems.

Repository Structure

FishBEVOD/
├── fishBEVDet/              # BEVDet adapted with MEI distortion-aware VTM
├── fishbevformer/           # BEVFormer adapted with MEI distortion-aware VTM
├── fishPETR/                # PETR adapted with MEI distortion-aware 3D PE
├── kitti360_nusc_converter/ # KITTI-360 → nuScenes format conversion pipeline
└── nuscenes-devkit/         # Customized nuScenes devkit (local, for visualization and evaluation)

Installation

The following instructions set up the environment is only tested for PETR and BEVFormer. For BEVDet, please refer to original environment instructions in the fishBEVDet, since it uses a different version of mmdetection3d.

Prerequisites

Linux (tested on Ubuntu 20.04)
Python 3.8
CUDA 11.1
NVIDIA GPU (experiments run on NVIDIA A5000)

Step 1: Create and activate conda environment

conda create -n fishbevod python=3.8 -y
conda activate fishbevod

Step 2: Install PyTorch

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 \
    -f https://download.pytorch.org/whl/torch_stable.html

Step 3: Install mmcv-full

pip install mmcv-full==1.4.0 \
    -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html

Step 4: Install mmdet and mmseg

pip install mmdet==2.14.0
pip install mmsegmentation==0.14.1

Step 5: Install MMDetection3D

git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
git checkout v0.17.1
python setup.py install
cd ..

Step 6: Install additional dependencies

pip install einops fvcore seaborn iopath==0.1.9 timm==0.6.13 \
    typing-extensions==4.5.0 pylint ipython==8.12 numpy==1.19.5 \
    matplotlib==3.5.2 numba==0.48.0 pandas==1.4.4 \
    scikit-image==0.19.3 setuptools==59.5.0
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'

Step 7: Install the local nuscenes-devkit

This repository ships a customized nuScenes devkit with support for fisheye visualization and mixed-camera configurations. Install it locally instead of the PyPI version:

cd nuscenes-devkit/setup
pip install -r requirements.txt
cd ..
pip install -e .
cd ..

Dataset Preparation

1. Download KITTI-360

Download the KITTI-360 dataset from the official website and place it under data/kitti360/. Since the dataset is coverted to nuScenes format, you can also just name it as 'nuscenes' and vkitti360-trainval as v1.0-trainval if you prefer.

Expected layout:

data/kitti360/
├── calibration/
├── data_2d_raw/
├── data_3d_bboxes/
└── data_poses/

2. Convert KITTI-360 to nuScenes format

cd kitti360_nusc_converter

# Full trainval conversion
python convert_kitti360_to_nuscenes.py \
    --kitti360_root ../data/kitti360 \
    --output_dir ../data/kitti360

# (Optional) Create a mini split for debugging
python create_mini.py \
    --input_dir ../data/kitti360 \
    --output_dir ../data/vkitti360-mini

cd ..

The current converter produces a vkitti360-trainval split (~300 scenes, ~68K samples) compatible with the nuScenes devkit.

3. (Optional) Rectify fisheye images

To generate the rectified 6-camera baseline:

cd kitti360_nusc_converter
python rectify_fisheye_images.py \
    --kitti360_root ../data/kitti360 \
    --output_dir ../data/kitti360
cd ..

4. Create BEVDet data pickle

cd fishBEVDet
python tools/create_data_bevdet.py \
    --root-path ../data/kitti360 \
    --out-dir ../data/kitti360
cd ..

Training

All models are trained for 24/20 epochs with a batch size of 2 using AdamW with cosine annealing. Distributed training across multiple NVIDIA GPUs is recommended.

FishBEVDet

cd fishBEVDet
bash tools/dist_train.sh configs/bevdet/bevdet-r50-xxx.py 2 --validate
cd ..

FishBEVFormer (only for bevformer static version without temporal modeling)

cd fishbevformer
bash tools/dist_train.sh projects/configs/bevformer/bevformer_small_s_xxx.py 2
cd ..

FishPETR

cd fishPETR
bash tools/dist_train.sh projects/configs/petr/petr_vovnet_gridmask_p4_xxx.py 2
cd ..

Evaluation

# BEVDet
cd fishBEVDet
bash tools/dist_test.sh <config> <checkpoint> 2 --eval bbox
cd ..

# BEVFormer
cd fishbevformer
bash tools/dist_test.sh <config> <checkpoint> 2 --eval bbox
cd ..

# PETR
cd fishPETR
bash tools/dist_test.sh <config> <checkpoint> 2 --eval bbox
cd ..

License

This project is released under the MIT License.

Citation

If you find this work useful, please cite:

@article{liu2026fishbevod,
  title     = {Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras},
  author    = {Liu, Xiangzhong and Shen, Hao},
  journal   = {arXiv preprint arXiv:2603.27818},
  year      = {2026}
}

Acknowledgements

This work builds upon and is grateful to the following projects:

KITTI-360 — Liao et al., KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D, TPAMI 2022. The dataset and its rich sensor suite (stereo + fisheye cameras with LiDAR annotations) form the foundation of our benchmark.
nuscenes-devkit — Caesar et al., nuScenes: A multimodal dataset for autonomous driving, CVPR 2020. We extend the devkit to support fisheye visualization and mixed-camera evaluation.
PETR — Liu et al., PETR: Position Embedding Transformation for Multi-View 3D Object Detection, ECCV 2022. The projection-free BEV architecture that we adapt with MEI distortion-aware 3D positional encoding and polar coordinates.
BEVFormer — Li et al., BEVFormer: Learning Bird's-Eye-View Representation from LiDAR-Camera via Spatiotemporal Transformers, TPAMI 2024. The backward-projection BEV architecture adapted with MEI-based spatial cross-attention.
BEVDet — Huang et al., BEVDet: High-performance Multi-Camera 3D Object Detection in Bird-Eye View, arXiv 2021. The forward-projection BEV architecture adapted with MEI-based depth lifting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FishBEVOD: Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras

Abstract

Repository Structure

Installation

Prerequisites

Step 1: Create and activate conda environment

Step 2: Install PyTorch

Step 3: Install mmcv-full

Step 4: Install mmdet and mmseg

Step 5: Install MMDetection3D

Step 6: Install additional dependencies

Step 7: Install the local nuscenes-devkit

Dataset Preparation

1. Download KITTI-360

2. Convert KITTI-360 to nuScenes format

3. (Optional) Rectify fisheye images

4. Create BEVDet data pickle

Training

FishBEVDet

FishBEVFormer (only for bevformer static version without temporal modeling)

FishPETR

Evaluation

License

Citation

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
fishBEVDet		fishBEVDet
fishPETR		fishPETR
fishbevformer		fishbevformer
kitti360_nusc_converter		kitti360_nusc_converter
nuscenes-devkit		nuscenes-devkit
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

FishBEVOD: Benchmarking Multi-View BEV Object Detection with Mixed Pinhole and Fisheye Cameras

Abstract

Repository Structure

Installation

Prerequisites

Step 1: Create and activate conda environment

Step 2: Install PyTorch

Step 3: Install mmcv-full

Step 4: Install mmdet and mmseg

Step 5: Install MMDetection3D

Step 6: Install additional dependencies

Step 7: Install the local nuscenes-devkit

Dataset Preparation

1. Download KITTI-360

2. Convert KITTI-360 to nuScenes format

3. (Optional) Rectify fisheye images

4. Create BEVDet data pickle

Training

FishBEVDet

FishBEVFormer (only for bevformer static version without temporal modeling)

FishPETR

Evaluation

License

Citation

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages