- Module: TOSA (Wave-10 WARDOG #28)
- Paper: Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking (ORTrack)
- arXiv: https://arxiv.org/abs/2504.09228 (v1, 2025-04-12)
- CVPR 2025 open-access paper: https://openaccess.thecvf.com/content/CVPR2025/papers/Wu_Learning_Occlusion-Robust_Vision_Transformers_for_Real-Time_UAV_Tracking_CVPR_2025_paper.pdf
- Official code: https://github.com/wuyou3474/ORTrack
- Verification date: 2026-04-10
Read full local PDF at papers/2504.09228.pdf (11 pages, including method, experiments, references).
Key extracted technical details:
- Core method: ViT single-stream tracker with occlusion-robust representation (ORR) via random template masking.
- Masking mechanism: random masking modeled with a spatial Cox process (Eq. 1 in paper).
- ORR training objective: MSE consistency loss between unmasked vs masked template token features (Eq. 2).
- Distillation: Adaptive Feature-Based Knowledge Distillation (AFKD) with IoU-conditioned weighting (Eq. 3).
- Prediction loss: focal + GIoU + L1, with weighted sum (Eq. 4).
- Training recipe (reported): batch 32, AdamW, lr 4e-4, wd 1e-4, 300 epochs, LR drop at epoch 240.
- Datasets (reported training): GOT-10k, LaSOT, COCO, TrackingNet.
- Benchmarks (reported eval): DTB70, UAVDT, VisDrone2018, UAV123.
Cloned: repositories/ORTrack.
Validation findings:
- Repository exists and is public.
- README matches paper title and CVPR 2025 claim.
- Code structure contains train/test pipelines and ORTrack implementation files.
- Config files align with paper hyperparameters.
Practical runtime check:
- Direct run in modern Python/Torch environment fails due legacy dependencies (
torch._siximport and old stack assumptions). - Requirements pin
torch==1.10.0+cu102,torchvision==0.10.0+cu102,python=3.8style environment.
Shared dataset volume mounted:
/Volumes/AIFlowDev/RobotFlowLabs/datasets/
Found locally:
shared/cocowave10_staging/visdroneexists but currently empty (download log shows Google Drive quota failures).
Not found in current shared volume scan:
- UAVDT, DTB70, UAV123, full GOT-10k, full LaSOT, full TrackingNet, internal 1.8M UAV dataset path explicitly named for this module.
Paper claims (example):
- ORTrack-DeiT on UAVDT: 83.4 Prec / 60.1 Succ.
- ORTrack-DeiT on VisDrone2018: 88.6 Prec / 66.8 Succ.
- ORTrack-DeiT speed: ~206 FPS (GPU), ORTrack-D speed uplift.
Plausibility assessment:
- Claims are internally consistent with reported lightweight tiny-backbone architecture.
- Reported ablations are coherent (ORR improves robustness; AFKD improves speed with small accuracy drop).
- Full independent rerun not yet possible in this module due environment and dataset availability constraints.
External signal gathered:
- Semantic Scholar entry resolves and reports citation activity for this work.
- GitHub repository shows meaningful community engagement (stars/issues/forks).
Status:
- Independent third-party full reproduction package not identified yet.
- Evidence indicates active downstream usage/citation, but not enough to mark fully independently reproduced in our infrastructure.
- Environment fragility: upstream stack is tightly pinned to older CUDA/PyTorch tooling.
- Dataset bottleneck: staged VisDrone download attempts currently blocked by Google Drive quota.
- Weight packaging: teacher model artifact handling in upstream repo is manual and not reproducibility-optimized.
- Verdict: VERIFIED_WITH_RISKS
- Paper is real, method is technically coherent, and code exists.
- Proceed with staged ANIMA integration using a compatibility-first implementation path.
- CTO review flag: YES (due reproducibility risk from environment and dataset access).